Simpson’s paradox, or the Yule–Simpson effect, is a paradox in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined.

UC-Berkeley Example

One of the best known real-life example of Simpson’s paradox occurred when the University of California, Berkeley was sued for bias against women who had applied for admission to graduate school.

Gender Applicants Admitted
Male 8442 44%
Female 4321 35%
# load 'my.cmh'
source('~/Dropbox/Teaching/STAT766/mycmh.R', encoding = 'UTF-8')
my.cmh(3714,4728,1512,2809)
##        Test Cochran Mantel.Haenszel
## 1 Statistic 95.7927         95.7852
## 2   P-value  0.0000          0.0000

This 1973 data shows that men were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to random chance.

However, when the data are stratified by department:

Department Male Applicants Male Admitted Female Applicants Female Admitted
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 272 6% 341 7%
Data2 <- as.table(array(c(511,314,89,19,353,207,17,8,120,205,202,391,138,279,131,244,53,138,94,299,16,256,24,290),
                   dim = c(2, 2, 6),
                   dimnames =
                   list(Response = c("+", "-"), Gender=c("M","F"),
                        Ulcer = c("A","B","C","D","E","F"))))
mantelhaen.test(Data2,correct=F)
## 
##  Mantel-Haenszel chi-squared test without continuity correction
## 
## data:  Data2
## Mantel-Haenszel X-squared = 1.6722, df = 1, p-value = 0.196
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.7671426 1.0556245
## sample estimates:
## common odds ratio 
##          0.899897