The Hips Don’t Lie – Assuming That By Hips You Mean Numbers

For those who missed it, the United States went to the polls to elect a new President. Some people were surprised by the outcome.

Even Benedict Cumberbatch, seen here between takes on Sherlock Series 3.

Some people were not, including the new King of Quants, Nate Silver. Silver studied economics at the University of Chicago but really came to prominence in his predictions of baseball outcomes, based on his analysis of the associated statistics and sabermetrics. He correctly predicted, back in 2008, what would happen between Obama and Clinton, and he predicted, to the state, what the outcome would be in this year’s election, even in the notoriously fickle swing states. Silver’s approach isn’t secret. He looks at all of the polls and then generates a weighted average of them (very, very simplified) in order to value certain polls over others. You rerun some of the models, change some parameters, look at it all again and work out what the most likely scenario is. Nate’s been publishing this regularly on his FiveThirtyEight blog (that’s the number of electors in the electoral college, by the way, and I had to look that up because I am not an American) which is now a feature of the New York Times.

So, throughout the entire election, as journalists and the official voices have been ranting and railing, predicting victory for this candidate or that candidate, Nate’s been looking at the polls, adjusting his model and publishing his predictions. Understandably, when someone is predicting a Democratic victory, the opposing party is going to jump up and down a bit and accusing Nate of some pretty serious bias and poll fixing. However, unless young Mr Silver has powers beyond those of mortal men, fixing all 538 electors in order to ensure an exact match to his predictions does seem to be taking fixing to a new level – and, of course, we’re joking because Nate Silver was right. Why was he right? Because he worked out a correct mathematical model and  method that took into account how accurate each poll was likely to be in predicting the final voter behaviour and that reliable, scientific and analytic approach allowed him to make a pretty conclusive set of predictions.

There are notorious examples of what happens when you listen to the wrong set of polls, or poll in the wrong areas, or carry out a phone poll at a time when (a) only rich people have phones or (b) only older people have landlines. Any information you get from such biased polls has to be taken with a grain of salt and weighted to reduce a skewing impact, but you have to be smart in how you weight things. Plain averaging most definitely does not work because this assumes equal sized populations or that (mysteriously) each poll should be treated as having equal weight. Here’s the other thing, though, ignoring the numbers is not going to help you if those same numbers are going to count against you.

Example: You’re a student and you do a mock exam. You get 30% because you didn’t study. You assume that the main exam will be really different. You go along. It’s not. In fact, it’s the same exam. You get 35%. You ignored the feedback that you should have used to predict what your final numbers were going to be. The big difference here is that a student can change their destiny through their own efforts. Changing the mind of the American people from June to November (Nate published his first predictions in June) is going to be nearly impossible so you’re left with one option, apparently, and that’s to pretend that it’s not happening.

I can pretend that my car isn’t running out of gas but, if the gauge is even vaguely accurate, somewhere along the way the car is going to stop. Ignoring Nate’s indications of what the final result would be was only ever going to work if his model was absolutely terrible but, of course, it was based on the polling data and the people being polled were voters. Assuming that there was any accuracy to the polls, then it’s the combination of the polls that was very clever and that’s all down to careful thought and good modelling. There is no doubt that a vast amount of work has gone into producing such a good model because you have to carefully work out how much each vote is worth in which context. Someone in a blue-skewed poll votes blue? Not as important as an increasing number of blue voters in a red-skewed polling area. One hundred people polled in a group to be weighted differently from three thousand people in another – and the absence of certain outliers possibly just down to having too small a sample population. Then, just to make it more difficult, you have to work out how these voting patterns are going to turn into electoral college votes. Now you have one vote that doesn’t mean the difference between having Idaho and not having Idaho, you have a vote that means the difference between “Hail to the Chief” and “Former Presidential Candidate and Your Host Tonight”.

Nate Silver’s work has brought a very important issue to light. The numbers, when you are thorough, don’t lie. He didn’t create the President’s re-election, he merely told everyone that, according to the American people, this was what was going to happen. What is astounding to me, and really shouldn’t be, is how many commentators and politicians seemed to take Silver’s predictions personally, as if he was trying to change reality by lying about the numbers. Well, someone was trying to change public perception of reality by talking about numbers, but I don’t think it was Nate Silver.

This is, fundamentally, a victory for science, thinking and solid statistics. Nate put up his predictions in a public space and said “Well, let’s see” and, with a small margin for error in terms of the final percentages, he got it right. That’s how science is supposed to work. Look at stuff, work out what’s going on, make predictions, see if you’re right, modify model as required and repeat until you have worked out how it really works. There is no shortage of Monday morning quarterbacks who can tell you in great detail why something happened a certain way when the game is over. Thanks, Nate, for giving me something to show my students to say “This is what it looks like when you get data science right.”

Remind me, however, never to bet against you at a sporting event!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s