Chapter 2 Review of probability theory

2.1 Random variables

The dynamics of a stochastic process are describes by random variables and probability distributions. This section provides a brief discussion of the properties of random variables.

The probability theory is about random variables. Roughly speaking, a random variable can be regarded as an uncertain, numerical quantity (i.e. the value in \(\mathbb{R}\)) whose possible values depend on the outcomes of a certain random phenomenon. The random variable is usually denoted by a capital letter \(X, Y, \ldots,\) etc..

More precisely, let \(S\) be a sample space. A random variable \(X\) is a real-valued function defined on the sample space \(S\), \[X : S \rightarrow \mathbb{R}.\] Hence, the random variable \(X\) is a function that maps outcomes to real values.

Example 2.1 *Two coins are tossed simultaneously and the outcomes are \(HH, HT, TH\) and \(TT\). We can associate the outcomes of this experiment with the set \(A = \{1,2,3,4 \}\), where \(X(HH) = 1, X(HT) = 2, X(TH) = 3\) and \(X(TT) = 4\). Assume each of the outcomes has an equal probability of 1/4. Here, we can associate a function \(P\) (known as a probability measure) defined on \(S = \{HH, HT, TH, TT \}\) by

\[P(HH) = 1/4, P(HT) = 1/4, P(TH) = 1/4, P(TT) = 1/4.\]

A probability measure \(P : \mathcal{A} \rightarrow [0,1]\), where \(\mathcal{A}\) is a collection of subsets of \(S\), has the following properties

  1. \(0 \le P(A), \quad A \subset S\).

  2. \(P(S) = 1\).

  3. If \(A_i \cap A_j = \emptyset\) for \(i,j = 1,2, \ldots\), and \(i \neq j\) where \(A_j \subset S\), then \[P(\cup^\infty_{i=1} A_i) = \sum^\infty_{i=1} P(A_i).\]

Random variables can be discrete or continuous. If the range of a random variable is finite or countably infinite, then the random variable is a discrete random variable. Otherwise, if its range is an uncountable set, then it is a continuous random variable.

2.2 Probability distribution

The probability distribution of a random variable \(X\) is a function describing all possible values of \(X\) and their corresponding probabilities or the likelihood of obtaining those values of \(X\). Functions that define the probability measure for a discrete or a continuous random variable are the probability mass function (pmf) and the probability density function (pdf), respectively.

Suppose \(X\) is a discrete random variable. Then the function \[f(x) = P(X = x)\] that is defined for each \(x\) in the range of \(X\) is called the probability mass function (p.m.f) of a random variable \(X\).

Suppose \(X\) is a continuous random variable with c.d.f \(F\) and there exists a nonnegative, integrable function \(f\), \(f: \mathbb{R} \rightarrow [0, \infty)\) such that \[F(x) = \int_{-\infty}^x f(y)\, dy\] Then the function \(f\) is called the probability density function (p.d.f) of a random variable \(X\).

Examples of discrete and continuous random variables

The main quantities of interest in a portfolio of motor insurance are the number of claims arriving in a fixed time period and the sizes of those claims. Clearly, the number of claims can be describe by a discrete random variable, whose range is finite or countably infinite. On the other hand, the claim sizes can be describe by a continuous random variable defined over continuous sample spaces.

Example 2.2 Let \(N\) denote the number of claims which arise up to a given time. The range of all possible values \(N\) is \(\mathbf{N} \cup \{0\}\). Here \(N\) is an example of discrete random variable. We could model the number of claims by the Poisson family of distributions. Recall that a random variable \(N\) has a Poisson distribution with the parameter \(\lambda\) if its probability distribution is given by \[f(n) = e^{- \lambda} \frac{\lambda^n}{n !}, \quad \text{ for } n = 0,1,\ldots.\]

Now suppose further that the number of claims \(N\) which arise on a portfolio in a week has a \(\text{Poisson}(\lambda)\) where \(\lambda = 5\). Calculate the following quantities:

  1. \(\Pr(N \ge 6)\).

  2. \(\mathrm{E}[N]\).

  1. \(\Pr(N \ge 6) = 1 - \Pr(N \le 5) = 1 - \sum_{n=0}^5 f(n) = 0.3840393.\)

  2. Clearly, \(\mathrm{E}[N] = \lambda = 5\).

In R, density, distribution function, for the Poisson distribution with parameter \(\lambda\) is shown as follows:

Distribution Density function: \(P(X = x)\) Distribution function: \(P(X ≤ x)\)
Poisson dpois(x, lambda, log = FALSE) ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)

When lower.tail is set to be TRUE (or default), probabilities are \(P(X ≤ x)\), otherwise, \(P(X > x)\).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwcG9pcyg1LCBsYW1iZGEgPSA1LCBsb3dlci50YWlsID0gRkFMU0UpIn0=

Example 2.3 Let \(X\) denote the claim sizes in a given time period. The range of all possible values \(X\) is the set of all non-negative numbers. Here \(X\) is an example of a continuous random variable. Suitable families of distributions which could be used to modelled claim sizes are "fat tails" distribution. They allow for possibilities of large claim sizes.

Examples of fat-tailed distributions include

  • the Pareto distribution,

  • the Log-normal distribution,

  • the Weibull distribution with shape parameter greater than 0 but less than 1, and

  • the Burr distribution.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdncGxvdDIpXG5kZiA8LSBkYXRhLmZyYW1lKHg9c2VxKDAsMTAsYnk9MC4xKSlcbmdncGxvdChkZikgKyBcbiAgICBzdGF0X2Z1bmN0aW9uKGFlcyh4KSxmdW49ZHdlaWJ1bGwsYXJncyA9IGxpc3Qoc2hhcGUgPSAxLCBzY2FsZSA9IDEpKSAgKyBcbiAgICBsYWJzKHggPSBcInhcIiwgeSA9IFwiZih4KVwiLCBcbiAgICAgICB0aXRsZSA9IFwiV2VpYnVsbCBEaXN0cmlidXRpb24gV2l0aCBTaGFwZSAmIFNjYWxlIFBhcmFtZXRlcnMgPSAxXCIpICJ9

See https://dk81.github.io/dkmathstats_site/rvisual-cont-prob-dists.html for more details.

The course "SCMA 470 Risk Analysis and Credibility" provides more details about the loss distribution.

2.3 Conditional probability

A stochastic process can be defined as a collection or sequence of random variables. The concept of conditional probability plays an important role to analyse dependency between random variables in the process. Roughly speaking, conditional probability is the probability of seeing some event knowing that some other event has actually occurred.

Let \(A\) and \(B\) be two events (elements of \(\mathcal{A}\)). The conditional probability of event \(A\) given \(B\) denoted by \(P(A | B)\) is defined as \[P(A|B) = \frac{P(A \cap B)}{P(B)}.\] Note that \(P(A \cap B)\) is often called the joint probability of \(A\) and \(B\), and \(P(A)\) and \(P(B)\) are often called the marginal probabilities of \(A\) and \(B\), respectively.

The events \(A\) and \(B\) are independent if the occurrence of either one of the events does not affect the probability of occurrence of the other. More precisely, the events \(A\) and \(B\) are independent if \[P(A \cap B) = P(A)P(B),\] or equivalently, \[P(A|B) = P(A).\]

2.4 Law of total probability

Suppose there are three events: \(A\), \(B\), and \(C\). Events \(B\) and \(C\) are distinct from each other while event \(A\) intersects with both events. We do not know the probability of event \(A\). However, partial information and dependencies between events can be used to calculate the probability of event \(A\), i.e. we know the probability of event A under condition B and the probability of event A under condition C.

The total probability rule states that by using the two conditional probabilities, we can find the probability of event A, which is \[P(A) = P(A \cap B) + P(A \cap C).\] In general, suppose \(B_1, B_2, \ldots B_n\) be a collection of events that partition the sample space. Then for any event \(A\), \[P(A) = \sum_{i = 1}^n P(A \cap B_i ) = \sum_{i = 1}^n P(A | B_i ) P(B_i) .\]

Example 2.4 Suppose in a particular study area, the vaccination rate for the yearly flu virus is 70%. In addition, 10% of those vaccinated still get the flu that year. Calculate the conditional probability of someone getting the flu in this area given that the person was vaccinated.

Example 2.5 You are an invester buying shares of a company. You have discovered that the company is planning to introduce a new project that is likely to affect the company’s stock price. You have determined the following probabilities:

  • There is a 80% probability that the new project will be launched.

  • If a company launches the project, there is a 85% probability that the company’s stock price will increase.

  • If a company does not launch the project, there is a 30% probability that the company’s stock price will increase.

Calculate the probability that the company’s stock price will increase.

2.5 Conditional distribution and conditional expectation

Let \(X\) and \(Y\) be two discrete random variables with joint probability mass function \[f(x,y) = P(X = x, Y = y).\] If \(X\) and \(Y\) are continuous random variables, the joint probability density function \(f (x, y)\) satisfies \[P( X \le x, Y \le y) = \int_{-\infty}^x \int_{-\infty}^y f(u,v) \, du\, dv.\]

When no information is given about the value of \(Y\), the marginal probability density function of \(X\), \(f_X(x)\) is used to calculate the probabilities of events concerning \(X\). However, when the value of \(Y\) is known, to find such probabilities, \(f_{X|Y} (x|y)\), the conditional probability density function of \(X\) given that \(Y = y\) is used and is defined as follows: \[f_{X|Y} (x|y) = \frac{f(x,y)}{f_Y(y)}\] provided that \(f_Y (y) > 0\). The conditional mass function of \(X\) is defined in a similar manner. \[P(X = x | Y = y) = \frac{P(X = x, Y = y)}{P(X = x)}.\]

Note also that the conditional probability density function of \(X\) given that \(Y = y\) is itself a probability density function, i.e. \[\int_{-\infty}^\infty f_{X|Y}(x|y)\, dx = 1.\]

Note that the conditional probability distribution function of \(X\) given that \(Y = y\), the conditional expectation of \(X\) given that \(Y = y\) can be as follows: \[F_{Y|X}(x|y) = P(X \le x | Y = y) = \int_ {-\infty}^x f_{X|Y}(t|y) \, dt\] and \[\mathrm{E}(X|Y = y) = \int_{-\infty}^{\infty} x f_{X|Y}(x|y) \, dx,\] where \(f_Y(y) > 0\).

Note that if \(X\) and \(Y\) are independent, then \(f_{X|Y}\) coincides with \(f_X\) because \[f_{X|Y}(x|y) = \frac{f(x,y)}{f_Y(y)} =\frac{f_X(x)f_Y(y)}{f_Y(y)} = f_X(x).\]

2.6 Central Limit Theorem

This section introduces the Central Limit Theorem, which is an important theorem in probability theory. It states that the mean of \(n\) independent and identically distributed random variables has an approximate normal distribution given a sufficiently large \(n\). This applies to a collection of random variables from any distribution with a finite mean and variance. In summary we can use the Central Limit Theorem to extract probabilistic information about the sums of independent and identical random variables.

Let \(X_1, X_2, \ldots\) be a sequence of i.i.d. random variables with a finite mean \(\mathrm{E}[X_i] = \mu\) and finite variance \(\mathrm{Var}[X_i] = \sigma^2\). Let \(Z_n\) be the normalised average of the first \(n\) random variables \[\begin{aligned} Z_n &= \frac{\sum_{i=1}^n X_i/n - \mu}{\sigma/\sqrt{n}} \\ &= \frac{X_1 + X_2 + \ldots + X_n - n\mu}{\sigma \sqrt{n}}. \end{aligned}\] Then \(Z_n\) converges in distribution to a standard normal distribution.