49 Success fractions and sample space
Let’s take a look at a few counter-intuitive examples
Consider the table of 1973 applicants for University of California, Berkely.
Applicants | Admitted | |
Males | 8442 | 3738 |
Females | 4321 | 1494 |
To consider what this means, and how we might interpret it, we should first take a look at the concepts of success fractions.
.
Take a quick look again at the table above. Consider the ‘probability’ of being admitted based on being male, each application is like a ‘trial’ and each admission a ‘success’, so that we have a 44% chance of success for males, compared with 35% for women.
Of course, it feels like there is something fundamentally different when we think of events like tossing a coin, which seem to be governed by ‘real’ chance, to things like sports matches, job applications or meal preferences, where it’s more the concept of frequency.
Actually, in the Berkely example, this difference can be attributed more to the preferences of females and males, because a higher proportion of males applied for courses that had higher probabilities of success. For example, suppose the numbers looked like this:
Applicants for courses that are easy to get into | Successful applicants | Applicants for courses that are hard to get into | Successful applicants | |
Males | 150 | 75 (50%) | 50 | 10 (20%) |
Females | 50 | 25 (50%) | 50 | 10 (20%) |
If we take the success fraction of males and females as groups, this would result in [latex]\frac{85}{200}=42.5\%[/latex] success for males and [latex]\frac{35}{100}=35\%[/latex] success for females – even though they had the same success rates when broken down. This is called Simpson’s paradox, and is an example of how tricky reasoning about correlation can be. These hidden breakdowns could also be going on in other probabilistic situations.
Sample space
When trying to determine theoretical probabilities, the key is often to determine the sample space – a set of possible outcomes that can occur. It’s easiest when each of the outcomes is equally likely. For example, when rolling a die, the potential outcomes are that a 1, 2, 3, 4, 5, or 6 is rolled, and each of these possibilities is equally likely.
We can hence calculate the theoretical probability of an event based on this sample space. For example, consider the following examples.
Event | Outcomes resulting in success | Probability |
Rolling a 2 | 2 | [latex]\frac{1}{6}[/latex] |
Rolling an even number | 2,4,6 | [latex]\frac{3}{6}=\frac{1}{2}[/latex] |
Rolling a number greater than 3 | 4,5,6 | [latex]\frac{3}{6}=\frac{1}{2}[/latex] |
Not rolling a 6 | 1,2,3,4,5 | [latex]\frac{5}{6}[/latex] |
It’s worth noting that different “events” can have the same probability of success.
It becomes more difficult when we have multiple processes leading to an event. A simple example is when we extend the sample space associated with a single roll of a die to the rolling of 2 dice. We now need to think of all combinations that can occur with respect to both dice.
Let’s be organized and represent this sample space using a table. The row indicates the number of the first die, the column indices the number of the second die.
1 | 2 | 3 | 4 | 5 | 6 | |
1 | (1,1) | (1,2) | (1,3) | (1,4) | (1,5) | (1,6) |
2 | (2,1) | (2,2) | (2,3) | (2,4) | (2,5) | (2,6) |
3 | (3,1) | (3,2) | (3,3) | (3,4) | (3,5) | (3,6) |
4 | (4,1) | (4,2) | (4,3) | (4,4) | (4,5) | (4,6) |
5 | (5,1) | (5,2) | (5,3) | (5,4) | (5,5) | (5,6) |
6 | (6,1) | (6,2) | (6,3) | (6,4) | (6,5) | (6,6) |
In this case there may be a number of ways of representing the ‘sample space’, depending on what we’re ultimately interested in. If we are just interested in the possible outcomes of 2 dice, then the above table is sufficient for representing the sample space of the 36 equally likely outcomes. However if we’re interested in the sum of the two dice, then the sample space could also be expressed as {2,3,4,5,6,7,8,9,10,11,12} , keeping in mind that in this case we’d need to be careful in making calculations because each of those sums is not equally likely. For example, looking at the table, there is only a [latex]1/36[/latex] chance of rolling a sum of 2 while there is a [latex]6/36=1/6[/latex] chance of rolling a 7.
The traveller and the ticket inspector
Suppose a forgetful man boards a train every day, but only remembers to validate his ticket 2/3 of the time. The train he boards has random ticket inspectors, who will board the train and check whether tickets are validated 1/6 of the time.
How could you represent the sample space of random events here? What are the different outcomes? If we relate this to rolls of the dice, which numbers could we use to represent the forgetful man validating his ticket and which numbers (on the second die) could we use to represent the ticket inspectors boarding the train? |