Pitfalls of Statistics – Simpson’s Paradox

Posted by

The Subtleties of Statistics

Statistics uniquely deal with uncertainty and randomness, distinguishing it sharply from other mathematical topics that are logical, deterministic, and clear-cut in causality. Consequently, the field of probability and statistics is rich in paradoxes, one of the most notorious being Simpson’s Paradox.

What is Simpson’s Paradox?

Simpson’s Paradox occurs when a trend observed in different groups of data reverses when these groups are combined. This paradox serves as a cautionary tale in statistical data analysis, showing how conclusions can be misleading without a thorough understanding of the underlying data.

Example 1: Medical Success Rates

  • Doctor A: Cardiovascular surgery success rate of 77.8% (70/90), Suturing success rate of 100% (10/10)
  • Doctor B: Cardiovascular surgery success rate of 20.0% (2/10), Suturing success rate of 90.0% (81/90)
Doctor A(Num of Success / Surgery)Doctor B(Num of Success / Surgery)
Cardiovascular surgery 70 / 90 = success rate 77.8%2 / 10 = success rate 20.0%
Suturing 10 / 10 = success rate 100.0%81 / 90 = success rate 90.0%
Overall80 / 100 = success rate 80.0%83 / 100 = success rate 83.0%

Overall, Doctor B has a higher success rate of 83.0% compared to Doctor A’s 80.0%. However, when looking at individual types of surgeries, Doctor A has higher success rates in both categories.

Example 2: The Berkeley College Admission Case

In 1973, a complaint was filed against Berkeley College alleging discrimination against women since their acceptance rate was lower than men’s:

  • Male Applicants: 44.0% acceptance (8,442 applicants)
  • Female Applicants: 35.0% acceptance (4,321 applicants)
Applicantsacceptance
Male844244.0%
Female432135.0%

However, a department-wise analysis showed that in most departments, women had higher acceptance rates than men, suggesting that women tended to apply to more competitive departments, which skewed the overall acceptance rates.

Male – Applicants(acceptance )Female – Applicants(acceptance )
A department825(62%)108(82%)
B department560(63%)25(68%)
C department325(37%)593(34%)
D department317(33%)375(35%)
E department191(28%)393(24%)

Correct Interpretation of Statistical Data

Simpson’s Paradox highlights the importance of how data groups are combined and analyzed. Misinterpretations can lead to incorrect conclusions, emphasizing the need for a multifaceted approach to data analysis to truly understand the structure of the data.

Conclusion: The Need for Careful Analysis

The examples of Simpson’s Paradox demonstrate that while statistical data can be a powerful tool, its interpretation requires careful consideration. Analyzing and understanding data from multiple perspectives is crucial to prevent erroneous decision-making and to grasp the full picture.

Leave a Reply

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다