3 Probability

Learning Outcomes

At the end of this chapter you should be able to:

  1. use the axioms of probability to prove simple probability results;
  2. use the axioms of probability to solve problems involving probabilities;
  3. understand the concept of conditional probability and use it to solve problems;
  4. know the multiplication rule and use it to solve problems;
  5. understand the concepts of independence and disjoint (mutually exclusive)
    events, and use these to solve problems;
  6. solve probability problems using tree diagrams.

 

Brief history of Probability

A gambler’s dispute in 1654 led to the creation of a mathematical theory of probability by two French mathematicians, Blaise Pascal and Pierre de Fermat. Antoine Gombaud (who called himself Chevalier de Méré, meaning Knight of Mé’ré), a French noble man and a writer with an interest in gaming and gambling questions, called Pascal’s attention to an apparent contradiction concerning a popular dice game.

Copy of an oil painting of Pascal with a dark background. Pascal has pale skin and brown or dark hair. Details of the painting beyond his face and large white collar merge into the dark background.
Blaise Pascal. Copy of oil on canvas painting c 1600s.
Reproduced under a CC BY 3.0 licence from WikiMedia Commons.
Oil painting of Pierre deFermat. deFermat has pale skin, long dark hair and a small moustache. He wears a dark coat or cloak draped over his shoulders.
Pierre de Fermat
Copy of oil on canvas painting by Roland Lefevre. Public Domain artwork sourced from WikiMedia Commons

The problem

The game consisted of throwing a pair of dice 24 times. The problem was to decide whether or not to bet even money on the occurrence of at least one double six during the 24 throws. A seemingly well-established gambling rule led de Méré to believe that betting on a double six in 24 throws would be profitable. But his own calculations showed just the opposite.

This problem posed by de Méré and others led to an exchange of letters between Pascal and Fermat in which the fundamental principles of probability were formulated for the first time.

 

Question: What is the probability of winning in the above game if you bet on “at least one double six”?

 

3.1 Notation and terminology

A random phenomenon is one for which the outcome cannot be predicted with certainty in advance. A random experiment  is a process that generates outcomes that can only be described in terms of probabilities. Examples of common random events are tossing a coin and rolling a die.

Sample space

sample space, denoted S, is the set of all possible outcomes of a random experiment. Any outcome in the sample space is called an elementary event.  An event is a subset of the sample space.

Example 3.1

If a coin is tossed twice then the sample space is S =\left\{HH, HT, TH, TT\right\}. Then {HH} is an elementary event. Further, \left\{HH,HT,TH\right\} is an event, which can be described in words as at least one H.

Assigning probabilities

The probability of an event A is the number of outcomes in the sample space favourable to A divided by the total number of outcomes, that is

    \[{\rm P}(A) = \frac{\# \ {\rm of\ outcomes\ favourable\ to\ } A}{\rm Total\ number\ of\ outcomes}.\]

Note that this assignment assumes that the outcomes are equally likely.

Example 3.1 (ctd)

If we toss a coin twice, then

    \begin{align*} {\rm P}(2H) &= \frac{1}{4}\\ {\rm P}({\rm At\ least\ one\ H}) &= \frac{3}{4}\\ {\rm P}({\rm No\ H}) &= \frac{1}{4}. \end{align*}

The null or empty event

The null event or empty event is denoted \phi (phi) and contains no outcomes.

Example 3.1 (ctd)

Toss a coin twice. The event “Obtaining 3 H” is a null event–we cannot get three H from two tosses of a coin.

Union and Intersection

The union of A and B is denoted A\up B, and is the set of points that are in A or B or both. The intersection of two sets A and B is denoted A \cap B, and is the set of points that are in both A and B.  We often say A or B to indicate union, and A and B to indicate intersection. These relationships can be represented by Venn diagrams, as below.

Union of two sets. Circles labelled A and B with an overlapping section in the center.
Union of two sets.
Intersection of two sets.
Intersection of two sets.

Complementary and Disjoint Events

The complement of an event A is denoted A', \overline A or A^c, and is the set of points that are not in AThis is illustrated in the Venn diagram below.
Two events A and B are mutually exclusive or disjoint if A\cap B = \phi, that is, A and B have no intersection. Disjoint events have no points in common.
The complement of a set. A circle labelled A. All the space outside the circle is highlighted and labelled Complement of A.
The complement of a set.
Disjoint or mutually exclusive sets. Circles labelled A and B which are separate with no overlap.
Disjoint or mutually exclusive sets.
Exercises
  1. What is the complement of A^c?
  2. What is the intersection of A and A^c?

Note that A\cup A^c = S, the entire sample space.

Example 3.2 

Two coins are tossed. Let A denote the event that the first toss yields a H.

(a) What is the complement of A?

(b) Specify some events that are disjoint with A.

Solution

(a) Here

    \[S = \{HH, HT, TH, TT\}, A = \{HH, HT\}, A^c = \{TH, TT\}.\]

(B) A^c is disjoint with A. \{TH\} and \{TT\} are also disjoint with A.

3.2 Axioms of probability

 

Black and white photograph of Andrey Komogorov standing in front of a chalkboard, holding pages of notes.
Andrey Kolmogorov
Reproduced under a CC BY-SA 4.0 DEED license from WikiMedia Commons

Kolmogorov (1903-1987), a Soviet mathematician, one of the most influential mathematicians of the twentieth century, put probability on a firm mathematical foundation (1933). He also contributed to topology, intuitionistic logic, turbulence, classical mechanics, algorithmic information theory and computational complexity.

He postulated the following three axioms of probability.

A1. P(S) = 1. (Something must happen.)

A2. P(A) \ge 0 for any event A.

A3. If A and B are disjoint then

    \[P(A\cup B) = P(A) + P(B).\]

Exercise

Draw a Venn diagram to represent Axiom A3.

 

3.3 Rules of Probability

Based on the axioms of probability we can derive the following rules. For any events A and B,

P1. P(\phi) = 0

P2. P(A^c) = 1-P(A)

P3. 0 \le P(A) \le 1

P4. P(A\cup B) = P(A) + P(B) - P(A\cap B)

Proof

P1. P(\phi) = 0.

Note that S and \phi are disjoint since S \cap \phi = \phi. Also, S \cup \phi = S and P(S) = 1, so

    \begin{align*} P(S\cup \phi) &= P(S)\\ {\rm so\ } P(S) + P(\phi) &= P(S) {\rm\ by\ A2,}\\ \Rightarrow 1 + P(\phi) &= 1\\ \Rightarrow P(\phi) &= 0. \end{align*}

P2. P(A^c) = 1-P(A).

A^c and A are disjoint, since A\cap A^c = \phi. Also, A\cup A^c = S. Then

    \begin{align*} P(A\cup A^c) = P(S)\\ {\rm so\ by\ A3\ and\ A1\  } P(A) + P(A^c) &= 1\\ \Rightarrow P(A^c) &= 1 - P(A). \end{align*}

Note that it also follows that P(A) = 1- P(A^c).

P3. 0 \le P(A) \le 1.

By A2 P(A) \ge 0, or 0 \le P(A). Also by A2, P(A^c) \ge 0. But

    \[P(A) = 1-\underbrace{P(A^c)}_{\ge 0} \le 1.\]

It follows that 0 \le P(A) \le 1.

P4. P(A \cup B) = P(A) + P(B) - P(A\cap B).

Note that A\cap B and A \cap B^c are disjoint sets, and A = (A\cap B) \cup  (A \cap B^c).  Then by A3,

    \[P(A) = P(A\cap B) +  P(A \cap B^c) \Rightarrow P(A\cap B^c) = P(A) - P(A\cap B).\]

Further, B and A\cap B^c are disjoint sets, and

Now

    \begin{align*} A\cup B &= (A\cap B^c) \cup B\\ \Rightarrow P(A\cup B) &= P(A\cap B^c) + P(B)\\ &= P(A) - P(A\cap B) + P(B)\\ &= P(A) + P(B) -  P(A\cap B). \end{align*}

Note that P(A) = P(A\cap B) +  P(A \cap B^c) is sometimes read as “P(A) equals probability of A with B plus probability of A without B“. This is a form of the Theorem of total probabilities. The event P(A\cap B^c) is also read as only A occurs.

Example 3.3

Let P(A) = 0.4, P(B) = 0.5 and P(A \cap B) = 0.3. Calculate

(a) P(at least one of A and B occurs)

(b) P(only A occurs)

(c) P(neither A nor B occurs)

Solution

(a) P(A \cup B) = P(A) + P(B) - P(A\cap C) = 0.4 + 0.5 - 0.3 = 0.6.

(b) The event “only A occurs” is represented by A \cap B^c. Now

    \begin{align*} P(A) &= P(A\cap B) + P(A \cap B^c)\\ \Rightarrow P(A\cap B^c) &= P(A) - P(A\cap B)\\ &= 0.4 - 0.3 = 0.1. \end{align*}

(c) Neither A nor B occurs is equivalent to (A\cup B)^c, so

    \[P[(A\cup B)^c] = 1- P(A\cup B) = 1- 0.6 = 0.4.\]

A Venn diagram can be used to illustrate this situation.

3.4 Conditional Probability

For events A and B, the probability of A occurring when it is known that B has occurred is called the conditional probability of A given B, denoted

    \[P(A|B),\]

read as “probability of A given B“. Note that P(A|B) is a probability, so it satisfies ALL the axioms and rules for probabilities.

Definition

If A and B are two events such that P(B) >0, then

    \[P(A|B) = \frac{P(A \cap B)}{P(B)}.\]

Since B has occurred, the sample space is now only B, and the probability that A occurs is the proportion of B where A occurs. This is illustrated in the Venn diagram  below.

Illustration of the conditional probability rule. Venn diagram of two circles, A and B, which overlap in the center. B is highlighted. An arrow points to the area where the two circles overlap, with text Given B, A can only occur in this set.
Illustration of the conditional probability rule.

Example 3.4 Bank data

The table below lists the number of bank workers by gender and job-grade.

  Job Grade
1 2 3 4 5 6 Total
Female 48 29 36 17 9 1 140
Male 12 13 7 11 12 13 68
Total 60 42 43 28 21 14 208

Data source ©Cengage Learning Inc. Reproduced by permission. www.cengage.com/permissions

 

Using the information in the table, calculate the following probabilities.

(a) An employee selected at random at job grade higher than 4 is a male.

(b) A female employee selected at random is at a job grade higher than 4.

(c) An employee selected at random at job grade below 5 is a female.

Solution

(a) We are given the employee is at Job Grade higher than 4, so we need

    \[P(Male|Job\ Grade \ge 5) = \frac{25}{35} = \frac{5}{7} \approx 0.7143.\]

(b) Here we know the employee is female, so the required probability is

    \[P(Job\ Grade \ge 5|Female) = \frac{10}{140} = \frac{1}{14} \approx 0.0714.\]

(c) Now we know the employee is at Job Grade below 5, so

    \[P(Female|Job\ Grade \le 4) =\frac{130}{173} \approx 0.7514.\]

Proportionally more females are at lower Job Grades.

3.5 Multiplication Rules

The conditional probability rule is

    \[P(A|B) = \frac{P(A\cap B)}{P(B)},\]

from which we obtain

    \[P(A\cap B) = P(B) \times P(A|B).\]

Similarly from

    \[P(B|A) = \frac{P(A\cap B)}{P(A)}\]

we obtain

    \[P(A\cap B) = P(A) \times P(B|A).\]

That is, we can obtain P(A\cap B) as

    \[PA\cap B) = P(A) \times P(B|A) = P(B) \times P(A|B).\]

This rule essentially says that for A and B to occur together,

  1. first A occurs, and then B occurs given A has occurred, OR
  2. first B occurs and then A occurs given B has occurred.

3.6 Tree Diagrams

Tree diagrams can be used to facilitate the solution of conditional probability problems. They implement the multiplication rule and the rule of total probabilities. This is illustrate in the tree diagram below.

Tree diagram illustrating the use of the conditional probability calculations for two sequential events A and B. We begin with a node with two arms, one leading to A and the other to A' (A complement). The arms are labelled P(A) and P(A') respectively. From the node with A, two arms lead to B and B' (B complement). The arms are labelled P(B) and P(B') respectively. Similarly, from the node A', to arms lead to B and B', and are labelled P(B) and P(B') respectively.
Tree diagram illustrating the use of the conditional probability calculations for sequential events.

To calculate the probability for any event we simply multiply the probabilities along the path of the event. For example, the path for P(A\cap B) is along the top branches of the tree, so this probability is found by multiplying the probabilities of the branches, that is,

    \[P(A\cap B) = P(A) \times P(B|A).\]

Similarly,

    \[P(\overline A\cap B) = P(\overline A) \times P(B|\overline A).\]

Note that the probabilities along branches that originate at the same node sum to 1.

The use of tree diagrams to solve probability problems is illustrated in the following example.

Example 3.5

Machines A and B turn out respectively 10% and 90% of the total production of a certain type of article. The probability Machine A produces a defective article is 0.01, while that for Machine B is 0.05. A randomly selected article from a day’s production is defective. What is the probability the article was made by Machine A?

Solution

The solution to these probability problems can be quite tricky. We outline the steps that one needs to follow to solve such problems efficiently and simply.

Step 1. Define appropriate events. Let A denote the event that an item is produced by Machine A and D denote the event that the item is defective.

Note that it might seem as if there are four events: an item is produced by Machines A or B, and an item is defective or not. But the other two events (item is produced by Machine B, item is defective) are the complements of the defined events.

Step 2. Express given probabilities in terms of these events and draw a tree diagram to represent this information.

    \[P(A) = 0.1, P(B) = 0.9, P(D|A) = 0.01, P(D|\overline A) = 0.05.\]

Note that it follows that P(\overline D|A) = 0.99 and P(\overline D|\overline A) = 0.95. The tree diagram is given below.

Tree diagram for Example 3.5. We start with a node with two arms, one leads to A and the the other to A'. The arms are labelled with the probabilities 0.1 and 0.9 respectively. From A, we have two arms leading to D and D', labelled with the probabilities 0.01 and 0.99. From A', two arms leads to D and D', labelled with the probabilities 0.05 and 0.95 respectively.
Tree diagram for Example 3.5.

Step 3. Express the required probability in terms of the defined events, and calculate it using the probability rules and the tree diagram.

We need P(A|D). Sometimes identifying the required probability is difficult. In the problem statement above we have highlighted the probability statement, which indicates that we need the probability of the event A, that is, the article was made by machine A, and the rest of the information is the given or conditional part of the probability statement.  Using the conditional probability rule we get

(1)   \begin{align*} P(A|D) &=\frac{P(A\cap D)}{P(D)} \\ &= \frac{P(A \cap D)}{P(A\cap D) + P(A\cap \overline D)} \\ &= \frac{0.1 \times 0.01}{0.1 \times 0.01 + 0.9 \times 0.05} \\ &= \frac{1}{46} \approx 0.0217. \end{align*}

Note that one could go from the first line straight to the third line, following the path along the two branches in which the event D occurs.

Example 3.6

Consider again the data of Example 3.4.

  Job Grade
1 2 3 4 5 6 Total
Female 48 29 36 17 9 1 140
Male 12 13 7 11 12 13 68
Total 60 42 43 28 21 14 208

Data source ©Cengage Learning Inc. Reproduced by permission. www.cengage.com/permissions

 

What is the probability that an employee picked at random

(a) is female and is at Job Grade 5 or above?

(b) at Job Grade 4 or below is male?

(c) at Job Grade 5 or above is female?

(d) at Job Grade 5 or above is male?

On the basis of the above calculations, what can be concluded about the promotion culture in the bank?

Solution

Let the event M denote the employee is male and the event J that Job Grade is 5 or above.

(a) \displaystyle{P(\overline M\cap J) = \frac{10}{208} \approx 0.0481.}

(b) \displaystyle{P(M|\overline J) = \frac{43}{173} \approx 0.2486.}

(c) \displaystyle{P(\overline M| J) = \frac{10}{35} \approx 0.2857.}

(d) \displaystyle{P(M| J) = 1-P(\overline M| J)= 1-0.2857 =0.7143.}

The probability of being at a higher Job Grade for females is less than 30%, indicating that there is a gender bias in promotions. BUT, this does not adjust for differences in education, experience, and other variables that may affect promotion.

3.7 Independent events

Intuitively, two events A and B are independent if the occurrence of one does not affect the probability of the occurrence of the other. Formally, events A and B are independent if

(2)   \begin{equation*} P(A\cap B) = P(A) \times P(B).  \end{equation*}

Otherwise the two events are dependent.

Notes

  1. To determine if two events are independent, the above condition in equation (2) needs to be verified.
  2. If A and B are independent events than

        \[P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A) \times P(B)}{P(B)} = P(A)},\]

    so the occurrence of B does not affect the probability of occurrence of A.

  3. Similarly,

        \[P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{P(A) \times P(B)}{P(A)} = P(B)},\]

    so the occurrence of A does not affect probability of the occurrence  of B.

This is intuitively what independence means. That is, if two events are independent than the occurrence of one does not affect the probability of the occurrence of the other.

Do not confuse mutually exclusive and independence.

Events A and B are mutually exclusive (or disjoint) if

    \[P(A \cap B) = 0.\]

Two circles labelled A and B which do not touch or overlap.

Events A and B are independent if

    \[P(A\cap B) = P(A) \times P(B).\]

Example 3.7 Bank Data

Consider again the data of Example 3.4.

  Job Grade
1 2 3 4 5 6 Total
Female 48 29 36 17 9 1 140
Male 12 13 7 11 12 13 68
Total 60 42 43 28 21 14 208

Data source ©Cengage Learning Inc. Reproduced by permission. www.cengage.com/permissions

 

(a) Let M denote the event that an employee is male, and J be the event that the employee is at Job Grade 5 or above. Are the events M and A independent?

(b) Is gender independent of Job Grade? If not, what is the relationship between gender and Job Grade?

Solution

(a) From the table,

    \[P(M) = \frac{68}{208}, P(J) = \frac{35}{208}, P(M \cap J) = \frac{25}{208} \approx 0.1208.\]

Then

    \[P(M) \times P(J) = \frac{68}{208} \times \frac{35}{208} \approx  0.0551 \ne P(M\cap J).\]

Thus M and J are NOT independent.

(b) Gender and Job Grade are dependent by part (a) above. Males are more likely to be at higher Job Grades and females are more likely to be at lower Job Grades.

Example 3.8

A woman is selling her house. She believes that there is a 0.3 chance that a person who inspects her house will purchase it. Assuming that the people inspecting the house decide independently whether or not to purchase the house, what is the probability that more than two people will inspect the house before it is sold?

Solution

Let A_1 denote the event that the first person to view the house buys it, and let A_2 denote the event that the second person to view the house buys it. Then A_1 and A_2 are independent. If more than two people view the house before it is sold, then this means that the first two to view the house do not buy it. This is represented by the event \overline A_1 \cap \overline A_2. Since A_1 and A_2 are independent, so are \overline A_1 and \overline A_2 (see result below). Then

    \[P(\overline A_1 \cap \overline A_2) = P(\overline A_1) \times P(\overline A_2) = (1-0.3) \times (1-0.3) = 0.49.\]

Result

If events A and B are independent, then the pairs of events \overline A and B, A and \overline B, \overline A and \overline B are also independent.

Exercise Prove this result.

3.8 Exercise

Draw a tree diagram for Example 4.8.

Example 3.9

While searching for oil in Australia, an oil explorer orders seismic tests to determine if oil is likely to be found in a certain drilling area. The following probabilities summarise past results concerning the reliability of the test: when oil does exist in the testing area, the test will indicate so 85% of the time; when oil does not exist in the testing area, the probability is 0.03 that the test will erroneously indicate that oil does exist. Preliminary exploration by geologists indicates that the probability of the existence of oil deposits in the test area is 0.45. If the seismic test is conducted and indicates the presence of oil, what is the probability that an oil deposit really does exist?

Solution

Let  O denote the event that oil is present and T denotes the event that the test indicates the presence of oil. Then P(T|O) = 0.85, P(T|\overline O) = 0.03, P(O) = 0.45. The tree diagram below represents this information.

Tree diagram fro Example 3.9. We start with a node with two arms, one leading to O and the other to O', labelled with probabilities 0.45 and 0.55 respectively. From O we have two arms leading to T and T', the first labelled with probabilities 0.85 and other one blank. From O' we have two arms leading to T and T', the first blank and the second labelled with the probability 0.97.
The tree diagram for Example 3.9.

We need

    \begin{align*} P(O|T) &= \frac{P(O\cap T)}{P(T)}\\ &= \frac{0.45 \times 0.85}{0.45 \times 0.85 + 0.55 \times 0.03}\\ &= 0.9584. \end{align*}

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Statistics: Meaning from data Copyright © 2024 by Dr Nazim Khan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book