Thread
Have you ever seen data that contradicts your expectations?

It could be due to Simpson's Paradox.

Trends in different groups can reverse when combined.

Let me explain this phenomenon.

🧵
Look at these data points.

The correlation is clearly positive.

The problem is that this is not the whole picture.

This is just a subgroup of our population.

1/10
In this picture, we have 3 subgroups.

Individually all of them show a positive correlation.

But if we put everything together, we have a negative trend.

2/10
This is Simpson's paradox.

A trend appears in several groups of data but disappears or reverses when the groups are combined.

Let's see an example 🔽

3/10
A medical study on kidney stone removal in 1986 showed that:

- A new treatment had 83% success rate.

- The old treatment had 78% success rate.

One can conclude that the new treatment is better.

The problem is that Simpson’s Paradox is lurking in the data.

4/10
When researchers considered kidney stone size, the result was reversed.

The old treatment was better for both small and large kidney stones.

How is it possible?

Let's look at this table:

5/10
The new treatment was tested on many small stones - which is probably an easier procedure - with 87%

On large stones, it performed really badly with 69%.

But the sample was smaller for large stones.

Therefore, the overall success is weighted more toward 87%.

6/10
The old treatment had more samples for large stones, so the overall result was "pulled" toward this rate.

The old treatment was more successful in both cases, but the overall result was "pulled" toward the lower number due to the sample sizes.

7/10
How to avoid this paradox?

1. Consider all relevant variables (kidney size in the example)

2. Visualize the data to identify any underlying patterns

3. Be careful when interpreting aggregate data,

8/10
That's it for today.

I hope you've found this thread helpful.

Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.

Thanks 😉

9/10

You should also join our newsletter, DSBoost.

We share:

• Interviews

• Podcast notes

• Learning resources

• Interesting collections of content

dsboost.substack.com

10/10
Mentions
See All