8 The Normal Distribution

Learning Outcomes

At the end of this chapter you should:

  1. understand the concept of a normal random variable;
  2. be able to calculate probabilities for a standard normal distribution;
  3. be able to solve problems based on general normal distributions;
  4. be able to model and solve problems involving sums and differences of normal distributions;
  5. be able to compute probabilities for a binomial distribution using an appropriate normal approximation;
  6. be able to model and solve problems using a combination of continuous and discrete random variables as appropriate.

 

 

8.1 Introduction

The normal distribution is a very important and often used continuous distribution.

  1. Let the random variable X have a normal distribution with mean \mu and variance \sigma^2. We write X \sim N(\mu,\sigma^2). Note that the variance is given in this description.
  2. Let Y = aX + b. Then

        \[{\rm E}(Y) = a\mu + b, \quad {\rm Var}(Y) = a^2 \sigma^2,\]

    and Y also has a normal distribution, Y \sim N(a\mu + b, a^2\sigma^2).

Standardisation

  1. In particular, if

        \[Z = \frac{X-\mu}{\sigma}=\frac{1}{\sigma}X - \frac{\mu}{\sigma} \quad \left(a=\frac{1}{\sigma}, b = -\frac{\mu}{\sigma}\right)\]

    then {\rm E}(Z) = 0 and {\rm Var}(Z) = 1, so Z \sim N(0,1). We call Z the standard normal distribution.

  2. If Z \sim N(0,1) and X=\mu +\sigma Z, then X\sim N(\mu, \sigma^2).

Probability density function—effect of changing the mean

Below are sketched the pdfs of the normal distributions for different values of the mean. Note the effect of changing the mean. The shape of the curve is unchanged, but the central location changes. Note that in each case the curve is symmetric about its mean.

Plots of normal distributions with different means but the same standard deviation. Three arch shaped curves are identical in size and shape, but moved slightly horizontally along the x axis as the mean changes.
Plots of normal distributions with different means but the same standard deviation.

 

Probability density function—effect of changing the standard deviation

Sketched below are the pdfs of the normal distributions for different values of the variance. Note the effect of changing the variance. The central location (line of symmetry) of the curve is unchanged, but the curve is wider if the variance is larger. Here the mean is 0, and the all the curves have y=0 as the  line of symmetry.

Plots of normal distributions with the same mean but different standard deviations. Three symmetrical arch-shaped curves peak in the same central location on the graph, and at the same height on the vertical axis, but have different shapes due to differing angles of the sides of the arch. The higher the standard deviation, the wider the curve and the less steep the slope down to meet the horizontal x axis.
Plots of normal distributions with the same mean but different standard deviations.

Normal distribution tables

Tables list cumulative probabilities for the standard normal distribution. We will illustrate the use of tables by examples. Note that normal probabilities can be easily  obtained using software such as R. Several types of tables are available for the standard normal distribution. We will use the one given below. (This table is also available for download.)

Norm

The table gives P(0 < Z < z) for z > 0. Note that P(Z < 0 ) = 0.5, so we can obtain

    \[P(Z < z) = 0. 5 + P(0 < Z < z).\]

8.2 Normal distribution problems

For X \sim {\rm N}(\mu,\sigma^2), there are usually two types of problems.

  1. Obtain the probability that X lies in a given interval.
  2. Given a probability, find the corresponding interval for x (the inverse problem).

Symmetry of the standard normal distribution about 0 will be used often. The examples below illustrate the ideas.

Example 8.1

Let Z \sim {\rm N}(0,1). Determine the following.

(i) P(Z < 1.0)

Graph of an arch-shaped normal distribution or "bell curve". The curve peaks at 0 on the horizontal x axis and 0.4 on the vertical, taping off to meet the horizontal axis (ie, to equal 0 on the vertical axis) just after x = 3 and x = -3. The area under the curve from x = 0 to x = 1 is highlighted and the total of the area is noted as 0.3413

Solution

    \[P(Z < 1.0) = 0.5 + P(0 < Z < 1) = 0.5 + 0.3413 = 0.8413.\]

Using R,

> pnorm(1)
[1] 0.8413447

(ii) P(Z < -1)

Solution

Graph of an arch-shaped normal distribution or "bell curve". The curve peaks at 0 on the horizontal x axis and 0.4 on the vertical, taping off to meet the horizontal axis (ie, to equal 0 on the vertical axis) at just after x = 3 and x = -3. Two symmetrical areas under the curve are highlighted - all values above x = 1 and all values below x = -1

By symmetry, as illustrated in the graph above,

    \[P(Z < -1) = P(Z > 1) = 0.5 - P(0 < Z < 1) = 0.5 - 0.3413 = 0.1587.\]

Using R,

> pnorm(-1)
[1] 0.1586553

(iii) P(Z > 1).

Solution

    \[P(Z > 1) = 0.5 - P(0 < Z < 1) = 0.5 - 0.3413 = 0.1587 = P(Z < -1).\]

Using R,

> pnorm(1, lower.tail = F)
[1] 0.1586553

(iv) P(Z > 2.51)

Solution

Graph of an arch-shaped normal distribution or "bell curve". The curve peaks at 0 on the horizontal x axis and 0.4 on the vertical, taping off to meet the horizontal axis (ie, to equal 0 on the vertical axis) just after x = 3 and x = -3. The area under the curve from x = 0 to x = 2.51 is highlighted, and noted to equal 0.4940

    \[P(Z > 2.51) = 0.5 - P(0 < Z < 2.51) =0.5 - 0.4940 = 0.0060.\]

Using R,

> pnorm(2.51, lower.tail = F)
[1] 0.006036558

(v) P(|Z| < 1.96)

Solution

Graph of an arch-shaped normal distribution or "bell curve". The curve peaks at 0 on the horizontal x axis and 0.4 on the vertical, taping off to meet the horizontal axis (ie, to equal 0 on the vertical axis) just after x = 3 and x = -3. The area under the curve from x = 0 to x = 1.96 is highlighted and noted to be equal to 0.4750

    \[P(|Z| < 1.96) = P(-1.96 < Z < 1.96) = 2\times P(0 < Z < 1.96) = 2 \times 0.475 = 0.9500.\]

Using R,

> pnorm(1.96) - pnorm(-1.96)
[1] 0.9500042

(vi) The value of z such that P(Z < z) = 0.950

Solution

We need to look up the value of the probability in the tables, and then read the value of z it corresponds to. This gives z = 1.645, obtained by interpolation between the neighbouring values. Alternatively, at the bottom of the tables are listed critical points of the normal distribution. This table lists the values of z corresponding to often-used right tail probabilities. Here since P(Z < z) = 0.95, the right tail probability is 1 - 0.95 = 0.05. From the critical points table the value of z this corresponds to is 1.645.

Using R,

> qnorm(0.95)
[1] 1.644854

(vii) The value of z such that P(|Z| < z)=0.95

Solution

Now, as shown in the graph, the probability between -z and z is 0.95. Then the two tail probabilities are a total of 0.05. By symmetry, the each tail contains a probability of 0.025. From the table of critical values we get z = 1.96.

Using R, we need to consider the probability in the lower tail. Note that the in the figure above, the missing tail probability (the unshaded area) is 0.025 in the lower tail (and the same in the upper tail). Then the we need the value of z such that P(Z < z) = 0.95 + 0.025 = 0.975. This is the probability we look up in R.

> qnorm(0.975)
[1] 1.959964

(viii) P(-1.5<Z < 2.7)

Solution

Note

    \[P(-1.5 < Z < 0) = P(0 < Z < 1.5) = 0.4332,\]

and

    \[P(0 < Z < 2.7) = 0.4965,\]

so the required probability is 0.4332 + 0.4965 = 0.9297.

Using R,

> pnorm(2.7) - pnorm(-1.5)
[1] 0.9297258

Non-standard normal distribution

Probabilities for non-standard normal distributions can be easily obtained using software. Here we illustrate the method using tables. The method is also useful in other contexts, so it is still worth understanding.

Let X \sim N(\mu,\sigma^2). Then using the standardisation results,

    \[Z = \frac{X-\mu}{\sigma} \sim N(0,1).\]

The following examples illustrate the ideas.

Example 8.2

Let X \sim {\rm N}(5,16). Determine the following.

(i) P(X < 0).

Solution

We standardise X by subtracting its mean and dividing by the standard deviation. Put

    \[Z = \frac{X-5}{4} \sim N(0,1).\]

Then

    \[P(X < 0) = P\left(\frac{X-5}{4} < \frac{0-5}{4}\right) = P(Z < -1.25) = 0.5-0.3944 = 0.1056.\]

Using R, we do not need to work with the standardised normal distribution, as we can simply specify the mean and standard deviation in the R function.

> pnorm(0, mean = 5, sd = 4)
[1] 0.1056498

(ii) P(X > 10).

Solution

    \[P(X > 10) = P\left(Z > \frac{10-5}{4}\right) = P(Z > 1.25) = 1 - 0.8943 = 0.1057.\]

Using R,

> pnorm(10, mean = 5, sd = 4, lower.tail = F)
[1] 0.1056498

(iii) P(−5 \le X \le 7).

Solution

We standardise both ends.

    \begin{align*} P(-5 \le X \le 7) &= P\left(\frac{-5-5}{4} \le Z \le \frac{7-5}{4}\right)\\ &= P(-2.5 \le Z \le 0.5)\\ &=  0.6915 - 0.0062 = 0.6853. \end{align*}

Using R,

> pnorm(7, mean = 5, sd = 4) - pnorm(-5, mean =  5, sd = 4)
[1] 0.6852528

(iv) The value of c such that P(|X-\mu| < c) = 0.95.

Solution

We know that P(|Z| < 1.96) = 0.95, so

    \[Z = \frac{X - \mu}{4} = \frac{c}{4} = 1.96 \Rightarrow c = 4\times 1.96 = 7.84.\]

Example 8.3

Let X \sim N(\mu, 0.250), and suppose P(X < 5.1) = 0.9772. Find the value of \mu.

Solution

    \[P(X<5.1) &= P\left(Z < \frac{5.1-\mu}{\sqrt{0.25}}\right)\]

Now P(Z < 2) = 0.9772, so

    \[\frac{5.1-\mu}{\sqrt{0.25}} = 2 \Rightarrow \mu = 5.1 - 2 \times \sqrt{0.25} = 4.1.\]

Example 8.4

A machine fills bottles of soft drink to a mean volume of 210 ml with a standard deviation of 10 mL. The label on the bottle specifies a volume of 200 ml. A bottle is under-filled if it contains less than the labelled volume. Assume that the volumes of the bottles are normally distributed.

(a) What percentage of bottles are under-filled?

(b) In order to reduce the percentage of under-filled bottles to 1% the company decides
to adjust the standard deviation of the volumes filled by the machine. What should
the standard deviation be reduced to?

Solution

(a) Let the random variable X denote the volume of a botte of soft drink. Then X \sim N\left(210,10^2\right).

    \[P(X < 200) = P\left(Z < \frac{200-210}{10}\right) = P(Z < -1) = 0.1586,\]

that is, 15.9%.

Using R,

> pnorm(200, mean = 210, sd = 10)
[1] 0.1586553

(b) Now X \sim N\left(210, \sigma^2\right). Then

    \[P(X < 200) = P\left(Z < \frac{200-210}{\sigma}\right) = 0.01.\]

Now P(Z < -2.3263) = 0.01, so

    \[\frac{200-210}{\sigma} = - 2.3263 \Rightarrow \sigma = \frac{200-210}{-2.3263} = 4.30,\]

that is the standard deviation should be reduced to 4.30 ml.

8.3 SUM of Normal Random Variables

Result

Let X \sim N\left(\mu_X, \sigma_X^2\right), Y \sim N\left(\mu_Y, \sigma_Y^2\right)and put W = X \pm Y. Then W \sim N \left(\mu_W, \sigma_W^2\right), where

    \begin{align*} \mu_W &= \mu_X \pm \mu_Y\\ \text{and\ } \sigma^2_W &= {\rm Var}(X \pm Y)\\ &= {\rm Var}(X) + {\rm Var}(Y) \pm 2 {\rm Cov}(X,Y). \end{align*}

If X and Y are independent, then {\rm Cov}(X,Y)=0, so

    \[\sigma^2_W = \sigma_X^2 + \sigma_Y^2.\]

This result can be extended to a sum of several independent normal random variables.

 

Example 8.5

Vasilopoulos et al. (2020) investigated the length of the femur in human males and females. They results are tabulated below, for the right male and female femur.

Male Female
Mean (cm) 43.04 39.90
Standard Deviation (cm) 2.32 2.40

What is the probability that a randomly selected male right femur is longer than a randomly selected female right femur?

Reference: Vasilopoulos A, Tsoucalas G, Panagouli E, Trypsianis G, Thomaidis V, Fiska A. (2020). Odontoid Process and Femur: A Novel Bond in Anatomy. Cureus, 12(3):e7372. https://doi.org/10.7759/cureus.7372

 

Solution

Let the random variables M and F denote the length of the right femur bone of a male and female respectively. Then M \sim N\left(43.04, 2.32^2\right) and  F \sim N\left(39.90, 2.40^2\right).  Put D = M - F. Then

    \[E(D) = E(M) - E(F) = 43.04 - 39.90 = 3.14, {\rm Var}(D) = {\rm Var}(M) + {\rm Var}(F) = 2.32^2 + 2.40^2 = 11.1424.\]

Then D \sim N(3.14, 11.1424). Note that we have assumed that the lengths of the femurs of the males and females are independent. We need

    \begin{align*} P(M > F) &= P(M - F >0)\\ &= P(D >0)\\ &= 1-0.1734 = 0.8266. \end{align*}

Note that we obtained the probability directly from R, using

> pnorm(0,3.14, sqrt(11.1424), lower.tail = F)
[1] 0.38265647

Example 8.6

A machine makes washers with hole diameters that are normally distributed, with mean 15.2 mm and variance 0.03 mm^2. Another machine makes bolts with diameters that are normally distributed, with mean 15.0 mm and variance 0.01 mm^2.
(a)  What is the probability that a randomly selected bolt will fit through a randomly selected washer?
(b) What should the mean diameter of the washer holes be if 99% of the bolts are to fit the washers?

Solution

Let the random variables W and B denote the diameters of a randomly selected washer and bolt respectively. Then W \sim N(15.2, 0.03), B \sim N(15.0, 0.01). Further, let D = W - B, so D \sim N(0.2, 0.04).

(a)

    \begin{align*} P(W > B) &= P(W - B > 0)\\ &= P(D > 0)\\ &= 1 - 0.1587 = 0.8413. \end{align*}

Again we obtained the probability P(D< 0) directly from R using

> pnorm(0,0.2, sqrt(0.04))
[1] 0.1586553

(b) Now W \sim N(\mu, 0.03), B \sim N(15.0, 0.01), so D = W - B \sim N(\mu, 0.04), and

    \[P(D > 0) = P\left(Z > \frac{0-(\mu-15)}{0.2}\right) = 0.99.\]

Since P(Z > -2.3263) = 0.99, we have

    \[\frac{(\mu-15)}{0.2} = 2.3263 \Rightarrow \mu = 15 + 2.3263 \times 0.2 = 15.4632 \approx 15.46 {\rm\ mm.}\]

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Statistics: Meaning from data Copyright © 2024 by Dr Nazim Khan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book