4 Random Variables

Learning Outcomes

At the end of this chapter you should be able to:

  1. explain the concept of a random variable;
  2. identify random variables as discrete or continuous;
  3. work with probability mass functions and cumulative distribution functions;
  4. compute expectations and variances and know their properties;
  5.  understand the concepts of binomial distribution;
  6. understand the concepts of Poisson distribution;
  7. model and solve problems using random variables;
  8. conduct hypothesis tests for binomial proportion and Poisson mean.

 

 

4.1 Introduction

A random experiment or event is one for which the outcome is not known in advance. Examples are:

  • Will the stock market crash in October?
  • Will interest rates go up next month?
  • Will there be another pandemic in the next decade?
  • Will there be another extreme weather event in Australia this year?

For such experiments/events the outcomes can only be described in terms of probabilities.

Two problems

  • The outcomes of random experiments are often not numerical.
    • Toss a single coin once. Sample space is S = \{H, T\}.
    • Interest rates: S = \{ \rm Up,\  Down,\ Unchanged\}

This makes it difficult to apply the full power of mathematics to these situations.

  • Several phenomena have common features.
    • Toss a coin ten times. What is the probability of two heads?
    • What is the probability that on two of the next ten days the Dow Jones index closes higher?

It is more efficient to study a generic model and then apply the results to all of them. This is called abstraction.

Solution

Translate outcomes from the sample space into numbers.

For the coin tossing example, the sample space is \{H, T\}. Put

    \[X = \begin{cases} 1, & \text{\ if H comes up}\\ 0, & \text{\ Otherwise} \end{cases}\]

Thus here X counts the number of H obtained in one toss of a coin. The value X takes is random — it depends on the chance. We call X random variable.

Random Variable

A random variable (rv) is a function (mapping) from a sample space to the real numbers.

Example 4.1

Toss a coin twice. The sample space is S=\left\{HH,HT,TH,TT\right\}. Let the random variable X denote the number of H tossed. Then

    \[X = \begin{cases} 0, & \text{\ if TT comes up}\\ 1, & \text{\ if HT or TH comes up}\\ 2, & \text{\ if HH comes up} \end{cases}\]

Exercise

Let the rv Y correspond to the event at least one H is tossed when a coin is tossed twice. What values can Y take, and which events in the sample space do they correspond to?

4.2 Discrete random variables

A random variable is discrete if its set of possible values can be listed. More formally, a random variable is discrete if its sample space is (finitely or infinitely) countable. Otherwise it is continuous.

In Example 5.1 above, X is a discrete random variable.

Discrete random variables usually arise from counting processes (the number of: employees with degrees; breakdowns; insurance claims; faulty laptops). In contrast, continuous random variables usually arise from measurements (returns from a diversified assets portfolio, equipment lifetime, weight of a bag of flour labelled 1 kg).

Continuous random variables will be covered in Chapter 6.

Random variables and Data

Discrete data (usually) arises from observation on discrete random variables.  Continuous data (usually) arises from observation on continuous random variables.

4.3 Probability mass function

The probability mass function (pmf) gives probabilities for discrete random variables. The pmf comes from the sample space.

For the single toss of a fair coin, let

    \[X = \begin{cases} 1, & \text{\ if H comes up}\\ 0, & \text{\ otherwise.} \end{cases}\]

Then

    \[P(X=1) = P(H) = \tfrac{1}{2}, P(X=0) = P(T) = \tfrac{1}{2}.\]

This can be tabulated.

x 0 1
p_X(x) = P(X=x) 0.5 0.5

    \[p_X(x) = P(X=x) = P({\rm rv}\ X {\rm \ takes\ the\ value\ } \ x).\]

p_X is called the pmf of X. Sometimes p_X can be given by a formula. In other cases we can only give p_X as a table.

Note on Notation

We use upper case letters from the end of the alphabet (such as U,V,W,X,Y,Z) to represent random variables, and the corresponding lower case letter (u,v,w,x,y,z) to represent a possible value of the rv.

Properties of pmfs

  1. p_X(x) \ge 0 for all possible values of x. Compare this with axiom P2 for probabilities.
  2.     \[\sum_x p_X(x) = 1,\]

    where the sum is over all the possible values of x. Compare with with axiom P1 for probabilities.

These properties follow from the properties of probabilities.

Example 4.2
The number of passengers for a scenic helicopter  flight varies at random from none to a maximum of four. Passengers can either make a booking or simply turn up. The probability mass function for the rv X which denotes the number of passengers is given below.

x 0 1 2 3 4
p_X(x) 0.1 0.4 c 0.1 0.1

(a) Determine the value of c.

(b) What is the probability that fewer than three passengers are on a flight?

(c) What is the probability that a flight has fewer than three passengers if there is a booking for only one passenger? Assume that booked passengers always turn up.

Solution

(a) Since the probabilities sum to 1, c = 0.3.

(b) P(X < 3) = 0.1 + 0.4 + 0.3 = 0.8.

(c) Here we know there is at least one passenger. So we want

    \[P(X < 3|X \ge 1) = \frac{P(1 \le X < 3)}{P(X \ge 1)} = \frac{0.4 + 0.3}{0.9} = \frac{7}{9}.\]

Note 

  1. AND is equivalent to intersection (\cap).
  2. OR is equivalent to union (\cap).
  3. IF is equivalent to conditional.

Defining events

Put A = \{X<3\}, B=\{X\ge 1\}. These are events. Part (c) asks for

    \[P(A\mid B) = \frac{P(A \cap B)}{P(B)}.\]

ALL the rules of probability still hold:

    \[P(A\mid B), P(A\cup B), P(A\cap B);\]

Further, the theorem of Total Probabilities holds:

    \[P(A) = P(A\cap B) + P(A\cap \overline B);\]

as does complements:

    \[P(\overline A) = P(A^c) = 1-P(A).\]

4.4 Cumulative distribution function

The cumulative distribution function of a random variable X at x, denoted F_X(x), is defined as

    \[F_X(x) = P(X \le x).\]

For a discrete random variable,

    \begin{align*} F_X(x) &= P(X \le x)\\ &= \sum_{k \le x} p_X(k), \end{align*}

that is, the sum of probabilities for all values less than and including x.

Example 4.3

x -1 0 1 2
p_X(x) 0.2 0.1 0.4 0.3
F_X(x) 0.2 0.3 0.7 1.0

Using the cumulative distribution function,

    \begin{align*} P(X \le 1) &= F_X(1) = 0.7\\ P(-1 < X \le 1) &= P(X \le 1) - P(X \le -1) = F_X(1) - F_X(-1) =0.7 - 0.2\\ P(X<1) &= P(X \le 0) = F_X(0) = 0.3.\\ P(0 \le X \le 1) &= P(X \le 1) - P(X \le -1) = F_X(1) - F_X(-1) = 0.7 - 0.2 =0.5\\ F_X(-2) &= P(X \le -2) = 0\\ F_X(3) &= P(X \le 3) = 1 \end{align*}

Some properties of CDFs

  1. By properties of probabilities, it follows that 0 \le F_X(x) \le 1.
  2. In general, P(a  < X \le b) = P(X \le b) -  P(X \le a) = F_X(b) - F_X(a). Note that this includes the probability of b but not of a.
  3. The cdf and pmf contain the same information but in different forms, and one can be obtained from the other.

Graph of the CDF

The graph of the cdf from Example 4.3 is given below. The cdf and pdf can both be obtained from the graph of the cdf.

Figure 3. Cumulative distribution function for Example 4.3. The graph has the usual x and y axes. The cdf is a step-wise function. A straight line runs from -1 along the negative x-axis. There is an open circle at (-1,0). The point (-1,0.2) has a full circle, from which a straight line runs to the point (0,0.2) at which there is an empty circle. The point (0,0.3) has a full circle, from which a straight line runs to the point (1,0.3), at which there is an empty circle. The point (1,0.7) has a full circle, from which a straight line runs to the point (2,0.7) at which there is an empty circle. The point (2,1) has a full circle from which a straight line runs to infinity.
Figure 3. Cumulative distribution function for Example 4.3.

Note that the graph is closed at the end-points indicated by full circles, and open at the other end-points. Thus F_X(-1) = 0.2, but just below -1 the cdf is 0.

4.5 Expectation

Consider the following data:

    \[1, 1, 1, 2, 2, 3, 3, 4, 4, 5\]

We want to calculate the mean of this data. We can obtain the frequency distribution of the data, as given in the table below.

i x_i f_i f_ix_i
1 1 3 3
2 2 2 4
3 3 2 6
4 4 2 8
5 5 1 5
\sum_{i=1}^5  f_i = 10 \sum_{i=1}^5  f_i x_i= 26

The mean is calculated as

    \[\overline x = \frac{\sum_{i=1}^{10} x_i}{10} = \frac{1+1+1+2+2+3+3+4+4+5}{10} = 2.6.\]

It can also be calculated by the grouped frequency data, as

    \[\overline x = \frac{\sum_{i=1}^5 f_i x_i}{\sum_{i=1}^5 f_i} = \frac{26}{10} = 2.6.\]

In general for a data set with n distinct values x_i each with corresponding frequency f_i, i = 1,2,\ldots, n, the mean is

    \[\overline x = \frac{\sum_{i=1}^n f_i x_i} {\sum_{i=1}^n f_i}.\]

Let {\sum_{i=1}^n f_i = N.  Now since the sum of the frequencies in the denominator is a constant, we can divide this into the sum in the numerator, giving

    \[\overline x = \sum_{i=1}^n \frac{f_i x_i}{N} = \sum_{i=1}^n \frac{f_i }{N}x_i.\]

The term \frac{f_i}{N} is the relative frequency for data point i. Note that

    \[0 \le \frac{f_i}{N} \le 1, {\rm \ and\ } \sum_{i=1}^n \frac{f_i}{N} = 1.\]

That is, to compute the mean of the data we multiply the data values with the corresponding relative frequency and sum these values. Note that the relative frequencies have similar properties to probabilities. They are non-negative and sum to 1. In fact relative frequencies are the observed probability for each data point.

We use this idea to define the mean of a random variable.

Definition: Expected value

For a random variable X, we define the expected value of X, denoted {\rm E}(X), by

    \[{\rm E}(X) = \sum_x x\ p_X(x),\]

where the sum is over all the possible values of x.
{\rm E}(X) is the theoretical mean, and can be thought of as a long run average of an indefinitely large number of observations on the random variable X.

Expected value of a function of a random variable

If g is a function, then for the random variable g(X),

    \[{\rm E}\left(g\left(X\right)\right) = \sum_x g(x)\ p_X(x).\]

For example,

    \[{\rm E}\left(X^2\right) = \sum_x x^2\ p_X(x).\]

Example 4.4
Toss a fair coin once and let the rv X denote the number of heads tossed. The pmf of X is given below.

X 0 1
p_X(x) 0.5 0.5

Then

    \[{\rm E}(X) = 0 \times 0.5 + 1 \times 0.5 = 0.5.\]

This is a long run average, so that in a large number of trials of tossing the coin we would expect half of them to be heads.

Example 4.5

X -1 0 1 2
p_X(x) 0.1 0.2 0.3 0.4

Calculate {\rm E}(X), {\rm E}\left(X^2\right) and {\rm E}(X+1).

Solution

    \begin{align*} {\rm E}(X) &= -1\times 0.1 + 0\times 0.2 + 1\times 0.3 + 2\times 0.4 = 1.\\ {\rm E}(X^2) &= (-1)^2\times 0.1 + 0^2\times 0.2 + 1^2\times 0.3 + 2^2\times 0.4 = 2.\\ {\rm E}(X+1) &= (-1+1) \times 0.1 + (0+1) \times 0.2 + (1+1)\times 0.3 + (2+1)\times 0.4 = 2. \end{align*}

Note that {\rm E}(X^2) \ne \left[{\rm E(X)\right]^2.

Notes

  1. {\rm E}(X) is also called the expectation or the mean of X.
  2. We also use the symbol \mu_X for {\rm E}(X). (\mu is the Greek letter mu.)
  3. In general, {\rm E}\left(X^2\right) \ne \left[{\rm E}\left(X\right)\right]^2.

Properties of expectation

E1. For any constant c, {\rm E}(c) = c.

E2. For constants a and b, {\rm E}(aX+b) = a{\rm E}(X) + b.

E3. For random variables X and Y, {\rm E}(X+Y) = {\rm E}(X) + {\rm E}(Y).

E4. {\rm E}(X-\mu_X) = {\rm E}\left[X-{\rm E}\left(X\right)\right] = 0.

The last result is similar to \sum_{i=1}^n (x_i - \overline x) = 0.

Proof

E1. A random variable that is constant with value c takes only the value c with probability 1.

x c
p_X(x) 1

Then

    \[{\rm E} }(X) = (c)(1) = c.\]

E2. We use properties of summations for this proof.

    \begin{align*} {\rm E}(aX+b) &= \sum_x(ax+b)\ p_X(x)\\ &= \sum_x \left(axp_X(x) + bp_X(x)\right) \\ &= \sum_xaxp_X(x) + \sum_x bp_X(x) \quad{\rm by\ summation\ rule\ S3}\\ &= a\sum_xx\ p_X(x) + b\sum_x p_X(x) \quad{\rm by\ summation\ rule\ S2},\\ &= a{\rm E}(X) + b \end{align*}

where in the last line we have used the fact that \sum_x p_X(x) = 1.

E3. This proof requires the properties of joint distributions, so we will omit it.

E4. First we note that \mu_X = E(X) is a constant (that is, its expectation is itself: E(\mu_X) = \mu_X).

    \[{\rm E}\left(X-\mu_X\right) = {\rm E}(X) - {\rm E}(\mu_X) ={\rm E}(X) - \mu_X = \mu_X - \mu_X =0.\]

4.6 Variance

For a random variable X, we define the  variance of X, denoted {\rm Var}(X), by

    \[{\rm Var}(X)= {\rm E}\left[\left(X-\mu_X\right)^2\right] = {\rm E}\left[\left(X-{\rm E}\left(X\right)\right)^2\right].\]

Compare with the definition of the variance for data:

    \[s^2 = \frac{1}{n-1}\sum_{i=1}^n \left(x_i-\overline x\right)^2,\]

which is an “average” of \left(x_i-\overline x\right)^2.

We also write \sigma_X^2 for Var(X).

The standard deviation of X, denoted \sigma_X, is the positive square root of variance, that is,

    \[\sigma_X= \sqrt{{\rm Var}(X)}.\]

Example 4.6

X -1 0 1 2
p_X(x) 0.1 0.2 0.3 0.4

From Example 4.5,

{\rm E}(X) = \mu_X = 1, so

    \begin{align*} {\rm Var}(X) &= {\rm E}\left[\left(X-\mu_X\right)^2\right]\\ &= (-1-1)^2(0.1) + (0-1)^2(0.2) +(1-1)^2(0.3) +(2-1)^2(0.4)\\ &= 1. \end{align*}

Properties of variance

V1. Var(X) \ge 0.

V2. Var(X) = {\rm E}\left(X^2\right) - \left[ {\rm E}(X)\right]^2. This form is simpler for calculating variance.
(Compare with \sum_{i=1}^n (x_i-\overline x)^2 = \sum_{i=1}^n x_i^2 -n{\overline x}^2.)

V3. Var(aX+b) = a^2 \Var(X).
(Compare with u_i = ax_i + b \Rightarrow s_u = a^2\ s_x^2.)

Proof

V1.

    \[{\rm Var}(X) = {\rm E}\left[\left(X-\mu_X\right)^2\right] = \sum_x \underbrace{\left(x-\mu_X\right)^2}_{\ge 0}\ \underbrace{p_X(x)}_{\ge 0} \ge 0,\]

as each term in the sum is non-negative (\ge 0).

V2.

    \begin{align*} {\rm Var}(X) &= {\rm E}\left[\left(X-\mu_X\right)^2\right]\\ &= {\rm E}\left[\left(X-\mu_X\right)\left(X-\mu_X\right)\right]\\ &= {\rm E}\left[X(X-\mu_X) - \mu_X(X-\mu_X)\right]\\ &= {\rm E}\left[X^2-\mu_X\ X\right] - \mu_X\underbrace{{\rm E}(X-\mu_X)}_{=0}\\ &= {\rm E}\left(X^2\right) - \mu_X {\rm E}(X)\\ &= {\rm E}\left(X^2\right) - \mu_X^2 = {\rm E}\left(X^2\right) -\left( {\rm E}(X)\right)^2. \end{align*}

V3.

    \begin{align*} {\rm Var}(aX+b) &= {\rm E}\left[\left(aX+b-{\rm E}\left(aX+b\right)\right)^2\right]\\ &= {\rm E}\left[\left(aX+b-a{\rm E}(X)-b\right)^2\right]\\ &= {\rm E}\left(\left[a(X-\mu_X)\right]^2\right)\\ &= {\rm E}\left[a^2(X-\mu_X)^2\right]\\ &= a^2\ {\rm E}\left[\left(X-\mu_X\right)^2\right]\\ &= a^2\ {\rm Var}(X). \end{align*}

Note that the standard deviation of aX+b, denoted \sigma_{aX+b}, is given by  \sigma_{aX+b} =|a|\sigma_X.

Example 4.7 

A builder is under contract to complete a project in no more than three months or there will be heavy cost overruns. Given the timings of scheduled work, the manager of the construction believes that the job can be finished in either 2, 2.5, 3 or 3.5 months, with
corresponding probabilities 0.1, 0.2, 0.4 and 0.3. Find the expected completion time and the variance, and interpret these quantities.

Solution

Step 1 Define an appropriate random variable.

Let the random variable X denote the time to complete the project.

Step 2 Determine its distribution.

x 2 2.5 3 3.5
p_X(x) 0.1 0.2 0.4 0.3

    \begin{align*} {\rm E}(X) &= 2\times 0.1 + 2.5\times 0.2 + 3\times 0.4 + 3.5\times 0.3 = 2.95 {\rm \ months}\\ {\rm E}(X^2) &= 2^2\times 0.1 + 2.5^2\times 0.2 + 3^2\times 0.4 + 3.5^2\times 0.3 = 8.925\\ {\rm Var}(X) &= {\rm E}(X^2) - \left[{\rm E}(X)\right]^2 = 8.925 - 2.95^2 = 0.2225.\\ \sigma_X &= \sqrt{0.2225} = 0.4717 {\rm\ months} \end{align*}

The mean time to completion is 2.95 months, with a standard deviation of 0.47 months. In addition, the probability of the construction time exceeding 3 months is 0.3, which is quite large. From this we conclude that it is very likely that the construction time will exceed 3 months.

Example 4.8

A square window frame produced by a machine has sides with mean length 2.5 m and variance 0.1 m^2. (Assume that the sides of each window are exactly equal.)

(a) Find the mean and variance of the perimeter of the frames.
(b) What is the mean area of the frames?

Solution

Let the random variable X denote the length of a side of the window. Then {\rm E}(X) = 2.5 and {\rm Var}(X) = 0.1.

(a) Let the random variable P denote the perimeter of the window. Then P = 4X, so

    \begin{align*} {\rm E} (4P) &= 4{\rm E}{X} = 4\times 2.5 = 10 {\rm m}.\\ {\rm Var}(4P) &= 4^2 {\rm Var}(P) = 16\times 0.1 = 1.6. \end{align*}

(b) Let the random variable A denote the area of the window. Then A = X^2. Now

    \begin{align*} {\rm Var}(X) &= {\rm E}(X^2) - \left[{\rm E}(X)\right]^2\\ \Rightarrow {\rm E}(A) &= {\rm E}(X^2) = {\rm Var}(X) + \left[{\rm E}(X)\right]^2 = 0.1 + 2.5^2 = 6.35 {\rm m}^2. \end{align*}

Standardised random variable

Let the random variable X have mean \mu_X and standard deviation \sigma_X. Put

    \[Z= \frac{X-\mu_X}{\sigma_X} = \underbrace{\frac{1}{\sigma_X}}_{a}\ X - \underbrace{\frac{\mu_X}{\sigma_X}}_{b}.\]

Then

    \[{\rm E}(Z) = {\rm E}(aX-b) = a{\rm E}(X) - b = \frac{1}{\sigma_X}\ \mu_X - \frac{\mu_X}{\sigma_X} = 0,\]

    \[{\rm Var}(Z) = {\rm Var}(aX-b) = a^2{\rm Var}(X) = \frac{1}{\sigma_X^2} {\rm Var}(X) = \frac{1}{\sigma_X^2}\times \sigma_X^2 = 1.\]

We call Z a standardised random variable. This result is similar to that for standardising data (see Section 3.8 in Chapter 3):

    \[z_i=\frac{x_i-\overline x}{s} \Rightarrow \overline z = 0 {\rm \ and \ } s^2_z = 1.\]

Note

    \begin{align*} Y &= aX+b\\ \Rightarrow {\rm Var}(Y) &= a^2 {\rm Var}(X)\\ {\rm and\ } \sigma_Y &= \left |a\right|\ \sigma_X, \end{align*}

since standard deviation is always non-negative. This is similar to the result for
linear transformation of data. Note that

    \[\left |a\right| = \begin{cases} a {\rm \ if\ } a \ge 0,\\ -a {\rm \ if\ } a < 0. \end{cases}\]

Risk in investment

In investments, one measure of risk is the standard deviation of the return. So if two investments have similar mean returns, the one with the larger standard deviation has larger risk. Usually the investment with larger mean return has larger risk (standard deviation).

4.7 Bernoulli distribution

The random variable X has a Bernoulli disttibution if it takes only two values, 0 and 1, with P(X=1) = p and P(X=0) = 1-p, 0 \le p \le 1. We write X \sim {\rm Bern}(p). (The symbol \sim is read “is distributed as”, or “ has the distribution”.)

The pmf of the X \sim {\rm Bern}(p) is given below, and from this the mean and variance of X can be found.

x 0 1
p_X(x) 1-p p

    \begin{align*} {\rm E}(X) &= (0)(1-p) + (1)(p) = p\\ {\rm E}(X^2) &= (0)^2(1-p) + (1)^2p = p\\ {\rm Var}(X) &= p-p^2 = p(1-p). \end{align*}

    \[{\rm E}(X) = p\]

    \[{\rm Var}(X) = p(1-p).\]

Example 4.9

(a) Toss a fair coin once, and let the rv X denote the number of H. Then X \sim {\rm Bern}(0.5).

(b) Toss a fair die once and let the rv X denote the number of fives or sixes tossed. Then X \sim {\rm Bern}(1/3).

(c) In the current economic environment the probability that the RBA will raise interest rates next month is estimated as 0.75. Let the rv X take the value 1 if the interest rate goes up and 0 otherwise. Then X \sim {\rm Bern}(0.75). The mean and variance of X are

    \[{\rm E}(X) = p = 0.75, {\rm Var}(X) = p(1-p) = 0.75\times 0.25 = 0.1875.\]

4.8 Binomial distribution

Toss a fair coin ten times. What is the probability of obtaining five heads? This example encapsulates the following key features, that are common in many situations.

  1. A fixed number n of independent and identical trials.
  2. Each trial has exactly two possible outcomes, denoted success (S) and failure (F).
  3. The probability of success is p which is fixed throughout the trials.
  4. Let the rv X denote the number of successes in these trials.

Then X has a binomial distribution with parameters n and p. We write X \sim {\rm Bin}(n,p).

Note that in this notation, {\rm Bern}(p) \equiv {\rm Bin}(1,p).

Probabilities for Bin(n,p)

The probability that stock market falls on any given day is 0.3. Assume that the market falls or rises independently of the previous day’s performance. What is the probability that in a week of six trading days, the market will fall in two of them?

We can list the sample space here. Let F denote that the market falls and N that it does not. Then the sample space is

NNNNNN, FNNNNN, NFNNNN, NNFNNN, NNNFNN, NNNNFN, NNNNNF, FFNNNN, FNFNNN, FNNFNN, …

Now by independence,

    \[P(FNNNNN) = 0.3(0.7)^5\]

Observations

  1. The two Falls can occur on any two of the six days.
  2. The probability of any of the sequences containing two Falls is the same, that is, 0.3^2\ 0.7^4.
  3.  If we can COUNT the number of ways of getting two falls out of six, then we can find the probability of two Falls in six days by multiplying this number with the probability 0.3^2\ 0.7^4.

COUNTING TECHNIQUES — Combinatorics

In effect in the above example we need to select the two places that falls can occur out of six. In general, the number of ways of selecting r things out of n is given by

    \[C^n_r=\begin{pmatrix}n\\ r \end{pmatrix} = \frac{n!}{r!\left(n-r\right)!},\]

where

    \[n! = {\rm n\ factorial\ } = n(n-1)(n-2)\ldots (2)(1).\]

For this notation to make sense, we define 0! = 1. For example,

    \[\dbinom{n}{n}=\frac{n!}{n!\underbrace{(n-n)!}_{=0!=1}} = 1, \dbinom{n}{1} = \frac{n!}{1!(n-1)!}=\frac{n(n-1)!}{1!(n-1)!} = n,\]

    \[\dbinom{6}{2} = \frac{6!}{\underbrace{2!4!}_{Add\ to\ 6}} = \frac{6.5.4!}{2!4!} = 15, \dbinom{10}{2}=\frac{10!}{2!8!} = \frac{10.9.8!}{\underbrace{2!8!}_{Add\ to\ 10}} = 45.\]

Example 4.10

(a) In how many ways can two stocks be selected for purchase out of 10?

(b) A part-time worker works only two day a week. In how many possible ways can he select his weekly roster? Assume a five day week.

(c) On a two day trip to Singapore, I want to stay in a different hotel each day. There are six hotels in the locality I want to stay in. How many ways different choices of hotel combinations do I have?

Solution

(a)

    \[\dbinom{10}{2} = \frac{10!}{2!8!} = \frac{10.9.8!}{2!8!} = 45.\]

(b)

    \[\dbinom{5}{2} = \frac{5!}{2!3!} = \frac{5.4.3!}{2!3!} = 10.\]

(c)

    \[\dbinom{6}{2} = \frac{6!}{2!4!} = \frac{6.5.4!}{2!4!} = 15.\]

Note

On most calculators you can quickly compute combinations using the button

    \[{{n}\choose{r}} \qquad \text{or} \qquad {^nC_r}.\]

We can also perform this calculation in R, as below.

choose(10,2)
[1] 45
choose(5,2)
[1] 10
choose(6,2)
[1] 15

Probabilities for X\sim \Bin(n,p)

First note that P(X=k)=0 for k<0 and k>n. Next, P(X=k) for k=0,1,\ldots,n indicates that there are k successes and the remaining n-k trials are failures. Thus the probability mass function for X \sim {\rm Bin}(n,p) is

    \[p_X(k) = P(X=k) = \begin{pmatrix} n\\ k \end{pmatrix}\ p^k\ (1-p)^{n-k}, \quad k=0,1,2,\ldots,n\]

where

  1.  

        \[\begin{pmatrix}n\\ r \end{pmatrix} = \frac{n!}{r!\left(n-r\right)!}\]

    is the number of ways of obtaining k successes from n trials;

  2. p^k is the probability of the k successes;
  3.  \left(1-p\right)^{n-k} is the probability of the n-k failures.

The cumulative distribution function

The cdf for X \sim {\rm Bin}(n,p) is

    \begin{align*} F_X(k) &= P(X\le k)\\ &= P(X=0) + P(X=1) + \ldots + P(X=k)\\ &= \sum_{x=0}^k\begin{pmatrix} n\\ x \end{pmatrix}\ p^x\ (1-p)^{n-x}, \quad k=0,1,2,\ldots,n. \end{align*}

Note 

Probabilities for the binomial distribution can be obtained:

  1.  using the formula — this will be used for a few examples only;
  2.  using tables — tables are available only for certain values of n and p ;
  3.  using software such as R — this will be our preferred method.

Example 4.11

The probability that stock the market falls on any given day is 0.3. Assume that the market falls or rises independently of the previous day’s performance. What is the probability that in a week of six trading days,

(a)  the market fall in two of them?

(b) the market falls on at least one day?

Solution

Let the rv X denote the number of days that the market falls out of six. Then X\sim {\rm Bin}(6,0.3)

(a)

    \[P(X=2) = \dbinom{6}{2}\ 0.3^2\ 0.7^4 = 0.3241.\]

Using R,

dbinom(2,6,0.3)
[1] 0.324135

(b) Now we need

    \begin{align*} P(X \ge 1) &= 1-P(X=0) = 1-\dbinom{6}{0}\ 0.3^0\ 0.7^6\\ &= 1- 0.1176 = 0.8824. \end{align*}

Again, using R,

pbinom(0,6,0.3, lower.tail = F)
[1] 0.882351

Note that by default pbinom() gives the lower tail probability, that is, P(X \le k). The option lower.tail = F gives the corresponding upper tail probability, that is, P(X > k). These are illustrated with the code below.

> pbinom(0,6,0.3)
[1] 0.117649
> 1 - pbinom(0,6,0.3)
[1] 0.882351
> pbinom(0,6,0.3, lower.tail = F)
[1] 0.882351

A note on R

In R, the base function for the binomial distribution is binom. This can be used in three ways determined by the letter used before it. So dbinom(k, n, p) give P(X = k) for X \sim {\rm Bin}{(n,p) distribution. Similarly, pbinom(k,n,p) gives P(X \le k), and rbinom(k, n, p) gives a simulation of k random values from a {\rm Bin}{(n,p) distribution.

We will see a similar syntax for other probability distributions.

Example 4.12

Fiddler crabs are tiny semi-terrestrial marine crabs found in mangrove swamps in the tropical belt.

 

 

Photograph of a small fiddler crab on sand, holding up it's large claw defensively
Fiddler Crab
Image by Richard Alexander from Pixabay

 

An ecologist is studying the roaming behaviour of fiddler crabs. He identifies 20 burrows that are occupied by fiddler crabs. Every afternoon he checks the burrows for presence of the crabs. The probability that a crab is in its burrow is 0.2. Assume that the crabs are present independently of each other. What is the probability that the number of burrows with a crab is

(a) zero,

(b) exactly 5,

(c) at least 10,

(d) at most 5.

Solution

Let the rv X denote the number of burrows  out of 20 that have a crab. Then X \sim \Bin(20,0.2).

(a) P(X = 0) = \binom{20}{0}\ (0.2)^0\ (0.8)^{20} = 0.0115.

(b) P(X=5 ) = \binom{20}{5}\ (0.2)^5\ (0.8)^{15} = 0.1746.

(c) P(X \ge 10) = 1-P(X \le 9) = 1- 0.9974 = 0.0026, using R,

pbinom(9,20,0.2, lower.tail = F)
[1] 0.002594827

(d) P(X \le 5) = 0.8042, using R,

pbinom(5,20,0.2)
[1] 0.8042078

MEAN AND VARIANCE OF {\rm Bin}(n,p)

    \[{\rm E}(X) = np\]

    \[{\rm Var}(X) = np(1-p)\]

    \[\sigma_X = \sqrt{np(1-p)}\]

Proof

For the proof we express the binomial distribution as a sum of n individual {\rm Bern}(p) trials. Let the rv X denote the number of successes in n independent and identical Bernoulli trials, and let Y_1, Y_2, \ldots, Y_n represent the number of successes in the individual trials. Then for i=1,2,\ldots,n,

    \[Y_i=\begin{cases} 1, \text{\ if trial $i$ is a success}\\ 0, \text{\ otherwise.} \end{cases}\]

Now Y_i \sim {\rm Bern}(p), are independent, and {\rm E}(Y_i) = p, {\rm Var}(Y_i) = p(1-p). Further,

    \[X = Y_1+Y_2+ \ldots + Y_n,\]

so

    \begin{align*} {\rm E}(X) &= {\rm E}(Y_1+Y_2+ \ldots + Y_n)\\ &= {\rm E}(Y_1) + {\rm E}(Y_2) + \ldots + {\rm E}(Y_n)\\ &= \underbrace{p+p+ \ldots + p}_{n \text{ times}} = np.\\ {\rm Var}(X) &= {\rm Var}(Y_1+Y_2+\ldots +Y_n)\\ &= {\rm Var}(Y_1) + {\rm Var}(Y_2) + \ldots + {\rm Var}(Y_n) \ {\rm (by \ independence)}\\ &= \underbrace{p(1-p)+p(1-p)+ \ldots + p(1-p)}_{n \text{ times}}\\ &= np(1-p). \end{align*}

Note that we have used the result that the variance of a sum of independent random variables is the sum of the variances, proved in Chapter 6.

Example 4.13

A farmer plants mango trees in clusters of  20 per 100 square metres. Each tree in a cluster will produce fruit this year with probability 0.6. The farmer has 5 such clusters.

(a) What is the probability that in a cluster more than 15 trees will produce fruit this year?

(b) What is probability that in at least one cluster more than 15 trees will produce fruit this year?

(c) What is the expected total number of trees that will produce fruit this year?

(d) What assumption(s) have you made in your solution?

Solution

(a) Let the rv X denote the number of trees in a cluster this year that produce fruit. Then X \sim \Bin(20,0.6), so

    \begin{align*} P(X > 15) &= 1-P(X \le 15)\\ &= 1- 0.9490 \quad{\rm (software)}\\ &= 0.0510. \end{align*}

(b) Let the rv Y denote the number of clusters in which more than 15 trees produce fruit. The Y \sim \Bin(5,0.0510), and

    \begin{align*} P(Y\ge 1) &= 1-P(Y =0)\\ &= 1-\binom{5}{0}\ (0.0510)^0\ (0.9490)^5\\ &= 1-0.7697\\ &= 0.2303. \end{align*}

(c) We can approach this problem in two ways.

Let X_1, X_2, \ldots, X_5 be the number of trees that fruit in cluster i, i=1,2, \ldots,5. Then X_i \sim {\rm Bin}(20,0.6), so {\rm E}(X_i) = 20\times 0.6 = 12. Now put T = X_1 + X_2 + \ldots + X_5, denote the total number of trees that fruit. Then

    \begin{align*} {\rm E}(T) &= {\rm E}\left(\sum_{i= 1}^5 X_i\right)\\ &= \sum_{i=1}^5 {\rm E}(X_i)\\ &= 5 \times 12 = 60 {\rm \ (since\ all\ the\ means\ are\ equal)} \end{align*}

Alternatively, we can consider the number of trees that fruit as a set of 20 Bernoulli trials for each cluster, giving a total of 100 trials. The the number of trees that fruit is T \sim \Bin(100,0.6), so {\rm E}(T) = 100\times 0.6 = 60 as before.

(d) we have assumed that each tree in a cluster fruits independently of other trees, and also the fruiting between trees in different cluster is also independent. We have also assumed that the probability of a tree fruiting is fixed at 0.6. The independence assumption may not hold, as the fruiting of trees in a cluster are most likely dependent.

4.9 Poisson distribution

The Poisson distribution models the number of occurrences of a phenomenon in a fixed interval or fixed time period or fixed area or fixed volume. Here volume is a generic term and can be interpreted as any fixed unit. This is a counting process, similar to the binomial distribution.

Examples are:

  1. The number of stars in a region (volume) of the galaxy.
  2. The number of aphids on a leaf of a rose bush.
  3. The number of arrivals per minute at a toll bridge.
  4. The number of times a printer breaks down in a month.
  5. The number of paint spots on a new car.
  6. The number of sewing flaws per pair of jeans during production.
  7. The number of eggs laid by an ostrich in a season.

Assumptions of the Poisson distribution

  1.  The occurrences are independent of each other.
  2.  Two occurrences cannot happen at the same location.
  3. The mean number of occurrences in a specified volume is fixed.

Comparison with Bin(n,p)

  1.  Binomial distribution has a fixed number of trials.  Poisson does not.
  2.  Binomial distribution has two parameters, n and pPoisson has only one: the mean number of occurrences.
  3. Binomial has two possible outcomes, Success and Failure, at each trial. Poisson does not.
  4. Binomial random variable takes the values 0,1,\ldots,nPoisson takes values 0,1,2, ….

Probability mass function

Let the random variable X have a Poisson distribution with parameter \lambda,  the mean number of occurrences in a fixed volume. We write X \sim {\rm Poi}(\lambda). Then

    \[p_X(k) = P(X=k) = \frac{e^{-\lambda}\ \lambda^k}{k!}, \quad k=0,1,2,\ldots\]

The cumulative distribution function of X is given by

    \[P(X \le k) = F_X(k) = \sum_{x=0}^k \frac{e^{-\lambda}\ \lambda^x}{x!}, \quad k=0,1,2,\ldots\]

that is, simply add up the probabilities up to and including k.

Exercise

Do the probabilities add up to 1?

Example 4.14
Poole studied 41 African male elephants in Amboseli National Park in Kenya for a period of 8 years and recorded the age of each elephant and the number of successful matings. (Poole, J. (1989). ‘Mate Guarding, Reproductive Success and Female Choice in African Elephants’. Animal Behavior (37): 842–849.) The mean number of successful mating in 8 years was 2.7.

(a)  What is the probability of exactly 2 successful mating in eight years by an adult male elephant?
(b) What is the probability of no successful mating in eight years by an adult male elephant?

(c) What is the probability of at least 5 successful matings in eight years by an adult male elephant?
(d) What is the probability of more than 5 successful matings in eight years by an adult male elephant?

(e) What is the probability of exactly 2 successful matings in 16 years by an adult male elephant?

Solution

Let the rv X denote the number of successful mating in 8 years by an adult male elephant. Then X \sim {\rm Poi}(2.7).

(a)

    \[P(X=2) = \frac{e^{-2.7}(2.7)^2}{2!} = 0.2450\]

Using R, the base function is pois(). For the individual probability for this part,

dpois(2, 2.7)
[1] 0.2449641

(b)

    \[P(X=0) = \frac{e^{-2.7}(2.7)^0}{0!} = 0.0672\]

Using R,

dpois(0, 2.7)
[1] 0.06720551

(c)

    \[P(X\ge 5) = 1-P(X \le 4) = 1 - 0.8629 = 0.1371,\]

using R,

ppois(4,2.7, lower.tail = F)
[1] 0.1370921

(d)

    \[P(X > 5) = 1-P(X \le 5) = 1-0.9433 = 0.0567,\]

using R,

ppois(5,2.7, lower.tail = F)
[1] 0.05673167

(e) Let the random variable Y denote the number of successful mating in 16 years by an adult male elephant. Now the time interval has doubled, so the mean number of successful matings in 16 years is 2 \times 2.7 = 5.4. Then Y \sim {\rm Poi}(5.4).

    \[P(Y = 5) = \frac{e^{-5.4}(5.4)^5}{5!} = 0.1728.\]

Using R,

dpois(5, 5.4)
[1] 0.1728213

MEAN AND VARIANCE OF Poi(\lambda).

X \sim Poi(\lambda)

{\rm E}(X) = \lambda

{\rm Var}(X) = \lambda

\sigma_X = \sqrt{\lambda}

4.10 Hypothesis test for Binomial Proportion

Consider the following scenarios.

  1. The probability a fiddler crab is in its burrow at sunset is 0.6 in summer. An ecologist believes this proportion is higher in winter. She examines 20 burrows at sunset in July in Queensland and finds 14 crabs. Is the probability that a fiddler crab is in its burrow at sunset higher in winter?
  2. A marketing campaign was launched with the aim of increasing market share for Coles Supermarket, which currently stands at 60%. A survey of 1000 randomly selected shoppers found that 650 of them shopped at Coles. Has the market share of Coles increased?
    Note that the survey gives the market share of Coles as 650/1000 = 0.65, or 65%.
  3. The probability of a still birth in the general population is 0.01. A medical researcher believes that this rate is higher in the migrant population. Data over a month shows that out of a total of 1,000 pregnancies in migrant women, 15 resulted in still births. Is the proportion of still births higher in the migrant population?

Observations

  1. All the above questions deal with assessing a binomial probability based on data. This falls under the topic of inference.
  2. The method commonly used for this particular type of problem depends on the sample size, that is, the number of observations or the number of units or the value of n.

We will illustrate the ideas below using the simple coin tossing problem: A coin is tossed 20 times and results in 13 heads. Is the coin biased in favour of head?

Let the random variable X denote the number of heads obtained in 20 tosses of the coin. Then X \sim {\rm Bin}(20,p). This is the test statistic, that is, we will use the value of this random variable to test the hypotheses of interest. Here the value of p is not known, and we want to make a decision regarding its value.
Note that the binomial distribution has two parameters: n and p. Here the value of n=20 is known, and the inference deals with the value of the parameter p.

Inference ALWAYS deals with a population parameter.

Steps

  1. State the hypotheses to be tested. That is, state what the current belief/value is for the binomial probability, and the belief or change expected. Null Hypothesis: the current belief, or belief of no change. In the coin tossing example, we begin by believing that the coin is fair. (If we believe the coin is biased in favour of heads, we need to know the bias, that is, what is the probability of tossing a H?)

        \[H_0: \text{Coin is fair, that is} \qquad H_0: p = 0.5\]

  2. Alternative hypothesis: this is really the value of the parameter under the stated test: Is the coin biased in favour of heads? If the coin is biased in favour of heads, then what is value of p?

        \[\text{Coin is biased, that is} \qquad H_1: p > 0.5\]

  3. p-value: We need some way of assessing if the coin is fair based on our data. Here, the observed number of heads, x_{obs}, is 13. We base our assessment on probabilities. What is the probability of obtaining 13 heads in 20 tosses if the coin is in fact fair? Under the null hypothesis, the distribution of the test statistic is Bin(20, 0.5). This is called the null distribution. Then

        \[P(X=13\mid p=0.5) = 0.0739\]

    Note that 13 is the observed number of heads and p=0.5 is the value of p assumed under H_0.

  4. Is this probability large or small? Are we likely to get 13 heads if the coin is really fair?
  5. If we think that 13 heads are too many for a fair coin, then any more than 13 will also be considered too many for a fair coin. So we should really be evaluating P(X\ge 13\mid p=0.5). We call this the p-value:

        \begin{align*} {\rm p-value} &= P(X\ge 13\mid p=0.5) = 1-P(X \le 12 \mid p=0.5)\\ &= 1 - 0.8684 = 0.1316.\end{align*}

  6. Decision: if the p-value is small then the data is inconsistent with the null hypothesis. Then:
    either the coin is not fair, that is, it is biased in favour of heads; or
    we have unusual data, or a rare event. In hypothesis testing we ignore the occurrence of rare events. Thus we would conclude that the coin is biased in favour of heads if the p-value is considered to be small.
  7. Significance level. If the p-value is less than the significance level, then it is deemed to be too small. The significance level is set in advance, before the hypothesis test is conducted. Usually the significance level is set at 2.5% (0.025) or 5% (0.05). Any event that occurs with probability less than 0.05 is taken to be infrequent or rare. In our example,

        \[\text{p-value} = 0.1316 > 0.025.\]

  8. Conclusion. Since p-value > 0.025, there is insufficient evidence to reject the null hypothesis at the 2.5% level of significance. We conclude that there is insufficient evidence that the coin is biased.

Writing Conclusions to Hypothesis tests

The conclusion must be written in terms of the question of interest in clear, simple terms. It must

  •  answer the question of interest,
  • be unambiguous,
  • impossible to misinterpret,
  • in the language of the context/discipline/subject.

Note 

The steps in Example 4.14 are for explaining the method. In practice we do not need to list them.  The next example shows a more direct and simplified solution.

Example 4.15

A garden centre claims that only 10% of their mango seeds fail to germinate. A mango farm trials 20 mango seeds from the garden centre and finds that 4 of them do not germinate. Is the garden centre’s claim incorrect? Use a significance level of 0.025 (2.5%).

Solution

Let the random variable X denote the number of seeds that do not germinate out of 20.
Then X\sim {\rm Bin}(20,p), where p is the proportion of seeds that do not germinate.

    \[H_0: p =0.1\quad H_1: p > 0.1\]

Note that the null hypothesis states the null belief, that the claim of the garden centre is true. The alternative hypothesis states the statement to be tested, that more than 10% of seeds do not germinate. The observed value of the test statistic is x_{obs} = 4. The p-value of the test is

    \[P(X \ge 4\mid p = 0.1) = 1-P(X \le 3\mid p=0.1) = 1-0.8670 = 0.1330 > 0.025,\]

so the data provides insufficient evidence against the null hypothesis. We conclude on the basis of this analysis that there is no reason to doubt the garden centre’s claim.

Notes

  1. In the statements of the hypotheses, the null hypothesis always contains the equality value. Since the equality value is used to calculate the p-value, it is sufficient to state the null hypothesis as H_0: p = p_0.
  2. The alternative hypothesis never includes equality. So it will be stated as H_1: p > p_0 (upper-sided test) or H_1: p < p_0 (lower-sided test) or H_1:p \ne p_0 (two-sided test).
  3. If a significance level is not specified then we use a 2.5% (0.025) for a one-sided test and 5% (0.05) for a two-sided test. The justification for this will be provided later.
  4. For a one-sided test, the probability expression for the p-value follows the same direction as the alternative hypothesis. Thus if H_1: p > p_0 then p-value = P(X \ge x_{obs}\mid p = p_0). Similarly, if H_1: p < p_0 then p-value = P(X \ge x_{obs}\mid p = p_0). We will see the examples of two-sided tests later.
  5. Usually one has an indication of the direction of change expected, so it is preferable to state a one-sided alternative hypothesis.

Example 4.16 A genetics example

Mouse genomes have 19 non-sex chromosome pairs, and X and Y sex chromosomes (females have two copies of X, males one each of X and Y). The total proportion of mouse genes on the X chromosome is 6.1%. 25 mouse genes are involved in sperm formation. An evolutionary theory states that these genes are more likely to occur on the X chromosome than elsewhere in the genome because recessive alleles that benefit males are acted on by natural selection more readily on the X than on autosomal (non-sex) chromosomes. On the other hand, the independence chance model would expect only 6.1% of the 25 genes to be on the X chromosome. In the mouse genome, 10 of 25 genes (40%) are on the X chromosome. Is the independence chance model appropriate for the mouse genome?

Solution

Let the random variable X denote the number of genes, out of 25, on the X chromosome. Then X \sim {\rm Bin}(25, p), where p denotes the proportion of genes on the X chromosome. The hypotheses of interest are:

    \[H_0: p = 0.061, \quad H_1: p > 0.061.\]

That is, we assume (H_0) that the independence model holds, and the alternative hypothesis is in favour of the evolutionary hypothesis. The observed number of genes on the X chromosome is 10, so the p-value of the test is

    \[{\rm p-value\ } = P(X \ge 10\mid p = 0.061) = 1-P(X \le 9 \mid p = 0.061) \approx 0,\]

so there is overwhelming evidence against the null hypothesis. We conclude the independence chance model is not appropriate for the mouse genome, and the evolutionary model is more appropriate.

4.11 Hypothesis test for Poisson mean

The hypothesis test for the Poisson mean proceeds in the same manner as for binomial proportion. The only difference is that the p-value is calculated using the Poisson distribution. The example below illustrates the method.

Example 4.17 Hypothesis test for Poisson mean

Quality control requires the number of stitching flaws on a garment to be at most one on average. An inspection of a random sample of 3 garments gave a total of 6 flaws. Are the average numbers of flaws per garment as required? Use a significance level of 2.5%.

Solution

Let the random variable X denote the number of stitching flaws in 3 garments. (This is our test statistic.) Let the average number of flaws  in three garments be \lambda. Then X \sim {\rm Poi}(\lambda).  Now the quality control specification is an average of 1 flaw per garment, so in three garments this translates to an average of 3 flaws. The hypotheses of interest are

    \[H_0: \lambda = 3\quad H_1: \lambda > 3\]

Note that here we could state the null hypothesis as H_0: \lambda \le 3. However, as mentioned previously, the equality value is used for the calculation of the p-value, so it is sufficient to simply state the equality value in the null hypothesis. The observed number of flaws is 6, so the p-value is

    \[{\rm p-value} = P(X \ge 6\mid \lambda = 3) = 1-P(X \le 5\mid \lambda = 3) = 1- 0.9161 = 0.0839 > 0.025,\]

so the data provides insufficient evidence against the null hypothesis. We conclude based on this analysis that the quality control specifications are met.

 

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Statistics: Meaning from data Copyright © 2024 by Dr Nazim Khan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book