Week 7 | March 1, 2018

Instructor: Amanda Hickman

Don't Let the Data Lie To You

Dataset of The Week (10 Min)

Presenting: Sarah El Safty and Josh Slowiczek

Lies Data Tells

Some people want to lie.

You’re often going to find yourself working with numbers that were given to you by a source who has a vested interest in how your story turns out. Ask lots of questions. Be skeptical.

All data lies.

What do you think is the fastest way to reduce the number of unsolved rape cases in your precinct?

If there's a meaningful reward for moving the numbers, there's a real incentive to move the numbers without changing the underlying issue at all.

And if you see a hospital that advertises high surgical success rates, does that mean they have the best surgeons? Or that they only take easy cases?

VA Hospitals are addicted to metrics and they almost always turn out to be gameable, often in ways that make problems worse.

90% of fetuses diagnosed with Down Syndrome are aborted.

All data has context.

It is made by people. People take shortcuts. They interpret things and make calls.

IP addresses as a proxy for location will give you a ton of hits in Kansas.

All data has biases.

I've talked about this before, so I won't dwell on it, but 311 calls are not a random sample of lived experiences.

This is closely related to the ecological fallacy: if I tell you that states with more foreign born residents have more wealthy households, what’s your next question? (Are foreign born people more likely to be wealthy? No.) An older study showed that there was a positive state-by-state correlation between literacy and foreign born populations: areas with high immigrant populations were likely to be more literate. What you don’t know is whether immigrants are likely to be more literate.

http://blog.statwing.com/the-ecological-fallacy/ http://andrewgelman.com/2013/02/03/heuristics-for-identifying-ecological-fallacies/

Question order changes how people respond to questions. This is over a decade old now, but in '03, Americans were more likely to say they support civil unions if you already asked them if they support gay marriage.

Pew has a lot of great research about survey design.

I used to think correlation was causation but then I took a statistics course.

Where do you find data?

Who is stuck? Let's brainstorm getting unstuck.

SQL Bingo

Slides | Source

Next week (10 min)