Simpson’s paradox, or the Yule–Simpson effect, is a paradox in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined.
One of the best known real-life example of Simpson’s paradox occurred when the University of California, Berkeley was sued for bias against women who had applied for admission to graduate school.
Gender | Applicants | Admitted |
---|---|---|
Male | 8442 | 44% |
Female | 4321 | 35% |
# load 'my.cmh'
source('~/Dropbox/Teaching/STAT766/mycmh.R', encoding = 'UTF-8')
my.cmh(3714,4728,1512,2809)
## Test Cochran Mantel.Haenszel
## 1 Statistic 95.7927 95.7852
## 2 P-value 0.0000 0.0000
This 1973 data shows that men were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to random chance.
However, when the data are stratified by department:
Department | Male Applicants | Male Admitted | Female Applicants | Female Admitted |
---|---|---|---|---|
A | 825 | 62% | 108 | 82% |
B | 560 | 63% | 25 | 68% |
C | 325 | 37% | 593 | 34% |
D | 417 | 33% | 375 | 35% |
E | 191 | 28% | 393 | 24% |
F | 272 | 6% | 341 | 7% |
Data2 <- as.table(array(c(511,314,89,19,353,207,17,8,120,205,202,391,138,279,131,244,53,138,94,299,16,256,24,290),
dim = c(2, 2, 6),
dimnames =
list(Response = c("+", "-"), Gender=c("M","F"),
Ulcer = c("A","B","C","D","E","F"))))
mantelhaen.test(Data2,correct=F)
##
## Mantel-Haenszel chi-squared test without continuity correction
##
## data: Data2
## Mantel-Haenszel X-squared = 1.6722, df = 1, p-value = 0.196
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.7671426 1.0556245
## sample estimates:
## common odds ratio
## 0.899897