The Subtleties of Statistics
Statistics uniquely deal with uncertainty and randomness, distinguishing it sharply from other mathematical topics that are logical, deterministic, and clear-cut in causality. Consequently, the field of probability and statistics is rich in paradoxes, one of the most notorious being Simpson’s Paradox.
What is Simpson’s Paradox?
Simpson’s Paradox occurs when a trend observed in different groups of data reverses when these groups are combined. This paradox serves as a cautionary tale in statistical data analysis, showing how conclusions can be misleading without a thorough understanding of the underlying data.
Example 1: Medical Success Rates
- Doctor A: Cardiovascular surgery success rate of 77.8% (70/90), Suturing success rate of 100% (10/10)
- Doctor B: Cardiovascular surgery success rate of 20.0% (2/10), Suturing success rate of 90.0% (81/90)
Doctor A(Num of Success / Surgery) | Doctor B(Num of Success / Surgery) | |
Cardiovascular surgery | 70 / 90 = success rate 77.8% | 2 / 10 = success rate 20.0% |
Suturing | 10 / 10 = success rate 100.0% | 81 / 90 = success rate 90.0% |
Overall | 80 / 100 = success rate 80.0% | 83 / 100 = success rate 83.0% |
Overall, Doctor B has a higher success rate of 83.0% compared to Doctor A’s 80.0%. However, when looking at individual types of surgeries, Doctor A has higher success rates in both categories.
Example 2: The Berkeley College Admission Case
In 1973, a complaint was filed against Berkeley College alleging discrimination against women since their acceptance rate was lower than men’s:
- Male Applicants: 44.0% acceptance (8,442 applicants)
- Female Applicants: 35.0% acceptance (4,321 applicants)
Applicants | acceptance | |
Male | 8442 | 44.0% |
Female | 4321 | 35.0% |
However, a department-wise analysis showed that in most departments, women had higher acceptance rates than men, suggesting that women tended to apply to more competitive departments, which skewed the overall acceptance rates.
Male – Applicants(acceptance ) | Female – Applicants(acceptance ) | |
A department | 825(62%) | 108(82%) |
B department | 560(63%) | 25(68%) |
C department | 325(37%) | 593(34%) |
D department | 317(33%) | 375(35%) |
E department | 191(28%) | 393(24%) |
Correct Interpretation of Statistical Data
Simpson’s Paradox highlights the importance of how data groups are combined and analyzed. Misinterpretations can lead to incorrect conclusions, emphasizing the need for a multifaceted approach to data analysis to truly understand the structure of the data.
Conclusion: The Need for Careful Analysis
The examples of Simpson’s Paradox demonstrate that while statistical data can be a powerful tool, its interpretation requires careful consideration. Analyzing and understanding data from multiple perspectives is crucial to prevent erroneous decision-making and to grasp the full picture.