The Subtleties of Statistics
Statistics uniquely deal with uncertainty and randomness, distinguishing it sharply from other mathematical topics that are logical, deterministic, and clear-cut in causality. Consequently, the field of probability and statistics is rich in paradoxes, one of the most notorious being Simpson’s Paradox.
What is Simpson’s Paradox?
Simpson’s Paradox occurs when a trend observed in different groups of data reverses when these groups are combined. This paradox serves as a cautionary tale in statistical data analysis, showing how conclusions can be misleading without a thorough understanding of the underlying data.
Example 1: Medical Success Rates
- Doctor A: Cardiovascular surgery success rate of 77.8% (70/90), Suturing success rate of 100% (10/10)
- Doctor B: Cardiovascular surgery success rate of 20.0% (2/10), Suturing success rate of 90.0% (81/90)
| Doctor A(Num of Success / Surgery) | Doctor B(Num of Success / Surgery) | |
| Cardiovascular surgery | 70 / 90 = success rate 77.8% | 2 / 10 = success rate 20.0% |
| Suturing | 10 / 10 = success rate 100.0% | 81 / 90 = success rate 90.0% |
| Overall | 80 / 100 = success rate 80.0% | 83 / 100 = success rate 83.0% |
Overall, Doctor B has a higher success rate of 83.0% compared to Doctor A’s 80.0%. However, when looking at individual types of surgeries, Doctor A has higher success rates in both categories.
Example 2: The Berkeley College Admission Case
In 1973, a complaint was filed against Berkeley College alleging discrimination against women since their acceptance rate was lower than men’s:
- Male Applicants: 44.0% acceptance (8,442 applicants)
- Female Applicants: 35.0% acceptance (4,321 applicants)
| Applicants | acceptance | |
| Male | 8442 | 44.0% |
| Female | 4321 | 35.0% |
However, a department-wise analysis showed that in most departments, women had higher acceptance rates than men, suggesting that women tended to apply to more competitive departments, which skewed the overall acceptance rates.
| Male – Applicants(acceptance ) | Female – Applicants(acceptance ) | |
| A department | 825(62%) | 108(82%) |
| B department | 560(63%) | 25(68%) |
| C department | 325(37%) | 593(34%) |
| D department | 317(33%) | 375(35%) |
| E department | 191(28%) | 393(24%) |
Correct Interpretation of Statistical Data
Simpson’s Paradox highlights the importance of how data groups are combined and analyzed. Misinterpretations can lead to incorrect conclusions, emphasizing the need for a multifaceted approach to data analysis to truly understand the structure of the data.
Conclusion: The Need for Careful Analysis
The examples of Simpson’s Paradox demonstrate that while statistical data can be a powerful tool, its interpretation requires careful consideration. Analyzing and understanding data from multiple perspectives is crucial to prevent erroneous decision-making and to grasp the full picture.
