2.3 Probability Distributions#
Random variables#
Random variables are fundamental in probability theory and statistics, representing numerical outcomes of random phenomena.
In actuarial science, a common example of a random variable is the number of insurance claims received by an insurance company within a certain period, such as a month, denoted as
could be 0 if there are no accidents reported by policyholders.It could be 1 if only one policyholder files a claim for a minor accident.
It could be 2 if there are two separate accidents reported, and so forth.
By defining the variable
This random variable is essential for actuarial calculations, such as estimating claim frequencies, setting premiums, and determining reserve requirements.
Definition
A random variable is a function that assigns a real number to each outcome in the sample space of a random experiment.
A random variable is like a translator between the real world and numbers. It takes things that can happen, like flipping a coin, and turns them into numbers we can work with.
For example, if we flip a coin, we could have a random variable called “X” that says “0” if it lands on tails and “1” if it lands on heads. So, instead of dealing with the actual outcomes, we use numbers to represent them, making it easier to do calculations and understand probabilities.
Example 2.31: Bernoulli random variable
In actuarial science, a common example of a Bernoulli random variable is whether an insurance policyholder makes a claim within a specified period, such as a month. Let us denote this Bernoulli random variable as
For instance, consider a scenario where a policyholder either makes a claim (
Here,
By analyzing the distribution of
Example 2.32: An example of a continuous random variable
An example of a random variable representing waiting time for the next insurance claim in actuarial science can be denoted as
For instance, suppose a policyholder submits a claim today, and we are interested in the time it takes until the next claim is received. Let us say
Here,
Types of Random Variables#
There are mainly two types of random variables:
Discrete Random Variables
Discrete random variables take on a countable number of distinct values.
Examples include the number of customers in a queue, the number of defective items in a batch, or the outcome of rolling a die.
Continuous Random Variables
Continuous random variables can take on any value within a given range or interval.
They are associated with measurements and are often represented by real numbers.
Examples include the height of a person, the time taken to complete a task, or the temperature of a room.
Probability functions#
Random variables are intimately connected with probability functions, which quantify the likelihood of different outcomes associated with those variables.
Discrete Random Variables#
Discrete random variables, representing outcomes with distinct values, are characterized by probability mass functions (PMFs), providing a systematic way to assign probabilities to each possible outcome.
Probability Mass Function (PMF)
The probability mass function (PMF), denoted as
Cumulative Distribution Function (CDF)
The cumulative distribution function (CDF), denoted as
Continuous Random Variables#
Continuous random variables, on the other hand, are associated with probability density functions (PDFs), which describe the relative likelihood of the variable taking on different values within a continuous range.
Probability Density Function (PDF)
The probability density function (PDF), denoted as
Cumulative Distribution Function (CDF)
The cumulative distribution function (CDF), denoted as
Here,
Discrete Probability Distributions#
Discrete uniform distribution#
Example 2.33: Discrete uniform distribution
Consider the scenario of rolling a fair six-sided die. The outcome of this experiment follows a discrete uniform distribution, where each outcome has an equal probability of occurrence.
Table of Possible Values and Probabilities:
Outcome ( |
Probability ( |
---|---|
1 |
1/6 |
2 |
1/6 |
3 |
1/6 |
4 |
1/6 |
5 |
1/6 |
6 |
1/6 |
Visualization: The probability mass function (PMF) represents the probabilities associated with each possible outcome. In this case, it would be a histogram or bar plot where the height of each bar corresponds to the probability of the corresponding outcome.
The cumulative distribution function (CDF) shows the cumulative probability up to each possible outcome. It starts at zero and increases step by step as we move through the possible outcomes.
R Code:
# Define the outcomes and their corresponding probabilities
outcomes <- 1:6
probabilities <- rep(1/6, 6)
# Visualize PMF
barplot(probabilities, names.arg = outcomes, xlab = "Outcome", ylab = "Probability", main = "Probability Mass Function")
# Calculate CDF
cdf <- cumsum(probabilities)
# Visualize CDF
plot(outcomes, cdf, type = "s", xlab = "Outcome", ylab = "Cumulative Probability", main = "Cumulative Distribution Function")
Tip
In R’s plot()
function, the type
parameter specifies the type of plot to be drawn. Here are some common options:
"p"
: Points (scatter plot)"l"
: Lines"b"
: Both points and lines"o"
: Overplotted points and lines"h"
: Histogram-like vertical lines"s"
: Stairs (step function)"n"
: No plotting
For example, using type = "s"
will create a step function plot.
The provided R code generates a plot representing the cumulative distribution function (CDF) for a discrete random variable. The plot consists of a step function where each step is represented by a horizontal line segment, extending from the left endpoint to the right endpoint of each step. Additionally, dots are placed at the left endpoints of the steps, and circles with holes are placed at the right endpoints. Furthermore, the plot extends the horizontal line segment from (6,1) to (7,1), representing the fact that the cumulative probability remains at 1 beyond the sixth outcome.
# Define the outcomes and their corresponding probabilities
outcomes <- 1:6
probabilities <- rep(1/6, 6)
# Calculate CDF
cdf <- cumsum(probabilities)
# Create an empty plot with specified x-axis and y-axis limits
plot(outcomes, type = "n", xlab = "Outcome", ylab = "Cumulative Probability", main = "Cumulative Distribution Function", xlim = c(0, 7), ylim = c(0, 1))
# Add the step function with segments of horizontal lines
for (i in 1:(length(outcomes)-1)) {
segments(outcomes[i], cdf[i], outcomes[i+1], cdf[i], lwd = 2) # Horizontal lines
}
# Add dots at the left ends of each step
points(outcomes, cdf, pch = 19, cex = 1.5) # Dot (pch = 19)
# Add circles at the right ends of each step
for (i in 1:(length(outcomes)-1)) {
points(outcomes[i+1], cdf[i], pch = 21, cex = 1.5) # Circle with hole (pch = 21)
}
# Draw a horizontal line extended from (6,1)
segments(6, 1, 7, 1, lwd = 2)
Poisson distribution#
Example 2.34: Discrete uniform distribution: Poisson Distribution for Number of Claims
In insurance, the number of claims that arise up to a given time period can often be modeled using the Poisson distribution. This distribution is commonly used when dealing with count data, such as the number of accidents, claims, or arrivals in a fixed interval of time or space.
Probability Mass Function (PMF) and Cumulative Distribution Function (CDF)
The probability mass function (PMF) of the Poisson distribution is given by:
Where:
is the random variable representing the number of claims, is a non-negative integer representing the number of claims, is the average rate of claims occurring per unit time.
The cumulative distribution function (CDF) of the Poisson distribution can be calculated as:
Example Table of Possible Values and Probabilities
Number of Claims ( |
Probability ( |
---|---|
0 |
|
1 |
|
2 |
|
… |
… |
Visualizing PMF and CDF
You can visualize the PMF by plotting the probabilities of different numbers of claims on a bar chart. For the CDF, you can plot the cumulative probabilities against the number of claims.
R Code Example
# Function to calculate PMF for Poisson distribution
pmf_poisson <- function(k, lambda) {
return(exp(-lambda) * lambda^k / factorial(k))
}
# Function to calculate CDF for Poisson distribution
cdf_poisson <- function(k, lambda) {
cdf <- rep(0, length(k))
for (i in 1:length(k)) {
cdf[i] <- sum(pmf_poisson(0:k[i], lambda))
}
return(cdf)
}
# Define parameters
lambda <- 3
k <- 0:5
# Calculate PMF
pmf_values <- pmf_poisson(k, lambda)
cat("PMF Values:", pmf_values, "\n")
# Calculate CDF
cdf_values <- cdf_poisson(k, lambda)
cat("CDF Values:", cdf_values, "\n")
# Plot PMF
barplot(pmf_values, names.arg = k, xlab = "Number of Claims (k)", ylab = "Probability", main = "PMF of Poisson Distribution")
# Plot CDF
plot(k, cdf_values, type = "s", xlab = "Number of Claims (k)", ylab = "Cumulative Probability", main = "CDF of Poisson Distribution")
Binomial distribution#
Discrete Probability Distribution Example: Binomial Distribution in Actuarial Science
In actuarial science, the Binomial distribution is often used to model the number of successful outcomes in a fixed number of independent Bernoulli trials. It’s applicable in scenarios such as the number of travel insurance policies that result in at most one claim.
Probability Mass Function (PMF) and Cumulative Distribution Function (CDF)
The probability mass function (PMF) of the Binomial distribution is given by:
Where:
is the random variable representing the number of policies resulting in at most one claim, is a non-negative integer representing the number of policies resulting in at most one claim, is the total number of travel insurance policies, is the probability that a policy results in at most one claim.
The cumulative distribution function (CDF) of the Binomial distribution can be calculated using the formula:
Example Table of Possible Values and Probabilities
Policies Resulting in at most One Claim ( |
Probability ( |
---|---|
0 |
|
1 |
|
2 |
|
… |
… |
Visualizing PMF and CDF
You can visualize the PMF by plotting the probabilities of different numbers of policies resulting in at most one claim on a bar chart. For the CDF, you can plot the cumulative probabilities against the number of policies resulting in at most one claim.
R Code Example
# Example R code for Binomial distribution
n <- 10 # Number of trials
p <- 0.3 # Probability of success in each trial
k_values <- 0:n # Number of successes
pmf <- dbinom(k_values, size = n, prob = p) # Probability Mass Function
cdf <- pbinom(k_values, size = n, prob = p) # Cumulative Distribution Function
# Plotting PMF
barplot(pmf, names.arg = k_values, xlab = "Number of Successes", ylab = "Probability", main = "Binomial PMF")
# Plotting CDF
plot(k_values, cdf, type = "s", xlab = "Number of Successes", ylab = "Cumulative Probability", main = "Binomial CDF")
Expected Value and Variance of a Discrete Random Variable#
When dealing with random variables, understanding their expected value and variance is crucial for making predictions and analyzing outcomes. The expected value, denoted by
Expected value
The expected value
Variance
The variance
Example 2.35: Calculation of expected value and variance
Consider the scenario of rolling a fair six-sided die. Let
The probability mass function for a fair six-sided die is
Expected Value:
So, the expected value of rolling a fair six-sided die is
Variance:
So, the variance of rolling a fair six-sided die is
Expected Value and Variance of a Continuous Random Variable#
Understanding the expected value and variance of continuous random variables is essential for analyzing processes where outcomes vary continuously, with the expected value representing the average outcome we anticipate, denoted by
Formulas
The expected value
And the variance
Example 2.36: Calculation of expected value and variance
Consider the scenario where we model the time between insurance claims using the Exponential distribution. The Exponential distribution is commonly used to represent the waiting time until the occurrence of a continuous event. Let
For the Exponential distribution, the probability density function is given by:
where
Suppose we have an insurance company with an average of 2 claims per day. We can model the time between claims with an Exponential distribution with
Expected Value:
So, the expected value of the time between claims is
Variance:
So, the variance of the time between claims is
R Functions for Probability Distributions:#
In R, various functions are available for working with probability distributions.
For the Poisson distribution with parameter
Poisson Distribution:
Poisson Distribution:
Density function:
dpois(x, lambda, log = FALSE)
: Computes the probability density function for the Poisson distribution at the specified values of with parameter .Distribution function:
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
: Computes the cumulative distribution function for the Poisson distribution at the specified quantiles with parameter .Quantile function (inverse c.d.f.):
qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)
: Computes the quantile function (inverse cumulative distribution function) for the Poisson distribution at the specified probabilities with parameter .Random generation:
rpois(n, lambda)
: Generates random deviates from the Poisson distribution with parameter .
For the Binomial distribution with parameters
Binomial Distribution:
Binomial Distribution:
Density function:
dbinom(x, size, prob, log = FALSE)
: Computes the probability mass function for the Binomial distribution at the specified values of with parameters (size) and (probability of success).Distribution function:
pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
: Computes the cumulative distribution function for the Binomial distribution at the specified quantiles with parameters and .Quantile function (inverse c.d.f.):
qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
: Computes the quantile function (inverse cumulative distribution function) for the Binomial distribution at the specified probabilities with parameters and .Random generation:
rbinom(n, size, prob)
: Generates random deviates from the Binomial distribution with parameters and .
Similarly, for the Normal distribution:
Normal Distribution:
Normal Distribution:
Density function:
dnorm(x, mean = 0, sd = 1, log = FALSE)
: Computes the probability density function for the Normal distribution at the specified values of with parameters (mean) and (standard deviation).Distribution function:
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
: Computes the cumulative distribution function for the Normal distribution at the specified quantiles with parameters and .Quantile function (inverse c.d.f.):
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
: Computes the quantile function (inverse cumulative distribution function) for the Normal distribution at the specified probabilities with parameters and .Random generation:
rnorm(n, mean = 0, sd = 1)
: Generates random deviates from the Normal distribution with parameters and .