Sunday, August 29, 2010

Did Twilight Cause the Recession?

Short answer - Of course not! Don't be silly.

But it's worth looking at anyway. First of all, here's the 'proof' I tweeted:

So there are a few things going on here,

Correlation Does Not Imply Causation

If you've ever visited Slashdot or the likes, whenever there's a report showing a 'link' between two phenomena, there will be at least one comment declaring "correlation != causation".

In other words, just because two sets of data seem to show a link - even a "statistically significant" one - doesn't mean one of the variables caused the other.

In such cases, it's important to look at what's going on in more detail.

For example, say the link is "people who watch more than two hours of TV a day, on average, die ten years younger than those who don't" - watching TV in and of itself doesn't shorten you life span. But people who watch a lot of TV are more likely to have generally unhealthy lifestyles - i.e. exercise less, eat more junk food.

That's more likely, but not necessarily. This also leads on to the next point,

Cause and Effect

So say you have your two variables that seem to correlate in a statistically significant way.

If we pretend the Twilight graph shows a 'significant correlation', did Twilight cause the recession? Seems highly unlikely.

Or, is it more likely that the recession led to a rise in the popularity of books (in general). Or did more people turn to fantasy as an escape from the harsh reality of the recession-ravaged world? Or is it just plain old coincidence?

Similarly, with the TV example you find that in fact both are most likely effects of the same thing - i.e. an unhealthy lifestyle. Neither caused the other.

And because of these first two reasons, always be wary of reported 'links' between things in the news (and in tabloids especially). It's not that hard to mislead or be mislead by statistics*.

Confirmation Bias

Based on our pre-existing biases, when presented with a piece of evidence, we tend to ignore things that don't support our theory/opinion and clutch at the things that do.

This is why opposing sides of a debate can use the same piece of research to support opposing claims.

And for anyone who doesn't like Twilight (and would like a link to be true), it's more likely they're going to see some sort of correlation in the graph, than someone who is a massive fan.

And this can mean ignoring the fact that it's a fairly poor match. I mean the overall shapes, if smoothed out, just about match up,

But then there's those massive spikes. And if you look at the details of the curves closely, they are actually quite different. Especially in the middle.

So depending on your own bias, you can look at that graph and argue for your opinion in either direction. Though anyone with any sense knows any match in this case is all just a weird coincidence.

And finally,

I 'Gamed' the Results

I started out trying to find something to blame Justin Bieber for. He became popular around the start of this year, so the thought was - what negative thing happened at the start of this year? The best I could think of was that the snow melted**. And this is the graph of that

Which is sort of convincing. But I could do better.

I decided to have a go with Twilight instead; and looking at the graph noticed that it became popular around the time of the recession. I checked, and what I found was that Twilight was a massively more popular search and as a result flattened the recession graph to next to nothing.

So I tried a few variations and finally found the adequately convincing "Edward Cullen" one. Just the right popularity to line up with the recession curve - and, for whatever reason, it fit the general shape quite well.

The point is, if you know how to tweak the variables: the search terms, the time scale, etc. it's not that hard to lie with a fairly convincing graph. Most of the 'Sex and Lesbians' ones in yesterday's post were gamed.

The trick for most searches is - for the year view, you'll get either a peak or a dip around Christmas/New Year; for the week/month, view you'll get peaks or dips around the weekends.

It's just a matter of matching up two or more searches to imply something. And if possible, drawing attention away from the scale (which can be a dead give away).

There are other ways to manipulate results, though - like with the Twilight one. Have a play!

Oh, and in true tabloid headline style, I tweeted the 'proof' as a question. That way, it's not seen as though I'm making some sort of outlandish claim, and the viewer is encouraged to make their own judgement. Albeit a slightly swayed one.

As always, if you come up with anything good yourself, lemme know!


* I'm obviously not an expert on these matters; I only have a vague idea of what I'm talking about. But there are several books on the subject of statistical manipulation and bullshit worth reading:
How to Lie With Statistics - Darrell Huff
Damn Lies and Statistics - Joel Best
The Black Swan - Nassim Nicholas Taleb
And so on...

** Some would argue that the snow melting was a good thing. And to be honest, I'd be inclined to agree.

No comments: