Yule–Simpson paradox

Suppose we have some counts from a binary trial:

counts

        event nonevent
treated   273       77
control   289       61

According to each of the following metrics, the risk of the event occurring is reduced for the treatment group relative to the control group.

Risk difference

\[ \text{risk difference} = {\frac{\text{event} \cap \text{treated}}{\text{treated}}} - {\frac{\text{event} \cap \text{control}}{\text{control}}} \]

riskdiff(counts)

[1] -0.04571429

Risk ratio

\[ \text{risk ratio} = {\frac{\text{event} \cap \text{treated}}{\text{treated}}} \bigg/ {\frac{\text{event} \cap \text{control}}{\text{control}}} \]

riskratio(counts)

[1] 0.9446367

Odds ratio

\[ \text{odds ratio} = {\frac{\text{treated} \cap \text{event}}{\text{treated} \cap \text{nonevent}}} \bigg/ {\frac{\text{control} \cap \text{event}}{\text{control} \cap \text{nonevent}}} \]

oddsratio(counts)

[1] 0.7483485

Stratification can be misleading

Let’s now stratify the population by some variable with values \(A\) and \(B\):

strata

$A
        event nonevent
treated   192       71
control    55       25

$B
        event nonevent
treated    81        6
control   234       36

Notice that these stratafied counts partition the original counts…

strata$A + strata$B

        event nonevent
treated   273       77
control   289       61

…so you would expect to draw the same overall conclusions about the effectiveness of the treatment, right?