The 10-minute Nate Silver (or: why the right needs to find another line of attack)

The righty backlash against Nate Silverization seems to have put a bee in my bonnet. I mean, how low have things sunk that I’m tempted to link to an Ezra Klein column for making good arguments against such model-skepticism? Ezra Klein, people. So I decided to devote a whopping 10 minutes or so to showing why the righty criticism of all things 538 is so silly.

To listen to the right talk, the output of Nate Silver’s “538″ (and other similar efforts) is just so counterintuitive and controversial – diagnoses often reached after looking at national popular-vote polling data (as if that means anything for a Presidential election), or thinking about aphorisms such as ‘a bad economy is bad for the incumbent!’ – that it requires either a big conspiracy theory, or (at best) illustrates that the Nate Silver Generation is totally putting themselves out on a limb and staking their somehow precarious reputations and complicated/opaque ‘models’ on the coming election, which will either confirm or (the idea/hope seems to be) decisively-refute Nate Silverness once and for all.

I disagree, and here’s why: I can basically replicate Nate Silver’s results (more or less) in 10 minutes of Excel hackery. I know because I just did.

And please keep in mind that I am a rabid, conspiratorial-minded righthy and I totally want Romney to win. There is absolutely no room for confusion here about my biases: my clear, unmistakable bias is for finding reasons to doubt, discredit, and demonize Nate Silver. I’d love that!

But here’s the thing: those reasons are just not there in widely-available public polling data. And what righties need to understand is that’s all Nate Silver is really doing: taking polling data, throwing them in a big bingo machine, and seeing whose ball comes out as the winner.

So what did I do? Here are the ultra-advanced, super-secret, oh-so-quanty steps I took. You may want to take notes because I am ‘opening the kimono’ on my proprietary ‘electoral model’, and I’m not even charging you for it!

1. Use a web-browser to go to They have two basic types of data I used: (a) a tally of electors that are ‘solid’ for either candidate, and (b) for each of those that aren’t ‘solid’, an average of state polling data. So, for example, right now it says that Obama has 142 in SOLID STATES, while Romney has 127. Similarly, it says that the average of whatever Ohio state polls indicates that Obama leads Romney 49-46.6% in that state. In passing let’s note that nobody thinks this site has a lefty bias, if anything, it is righty-biased.

2. Now, when I added up electors the SOLIDS plus the toss-up/swing states didn’t add up 538. I realized that the RCP isn’t tallying South Carolina or South Dakota either as solids nor showing them in the table of state averages. They seem pretty red so I just gave them to Romney as solids. So in fact my ‘model’ assumes that the solid tally is Obama 142, Romney 140.

3. There’s some funkiness with Maine and Nebraska’s electors – it looks like they each allocate an elector by congressional district or something equivalently unAmerican and communist – anyway I just attached that ‘extra elector’ to the state overall. So in my model, Nebraska has 5 electors not 4. Whatever. Once I did this the elector tally checked out to be 538, so I had all electors accounted for. Obviously I could deal with Maine/Nebraska’s weird extra elector more carefully in future work if I felt like. Similarly, if there are states that aren’t winner-take-all (I forget), I’m pretending they are. Doesn’t seem like a huge deal though.

4. Anyway once I had the solids (142-140), and the rest of the states laid out by poll data, the rest of the exercise is just to simulate a bunch (i.e. 10,000+) of elections and see who wins. In the words of one of my favorite grad school math professors, here I did the ‘dumbest possible thing’: to account for error in all these state poll numbers, for each state I just bumped the lead by a little bell-curve. I made these bell-curves in the most inefficient, dumb way (using Excel’s normsinv(rand())). So, say Obama is leading 49-46, so the polls show Obama+3. Then I draw a Gaussian random number; say it comes out -3.5. This means (in this scenario) the ‘real’ lead is not Obama+3 it is Romney+0.5, which means Romney will win the state. I did nothing special for turnout, for dependence on the economy, or any of that.

Anyway, do the above independently for every state, add up the Obama elector tally, see if it gets to 270 (I have read that Romney wins in a 269 tie), if it does the winner is Obama, if not, Romney. Count up Obama winners vs. total simulations (each laid out as rows in a spreadsheet), and that’s an Obama victory %. SERIOUSLY, THIS IS THE DUMBEST MONTE CARLO METHOD POSSIBLE. But easy as hell. I mean, just take a gander – you could totally probably do this too:

The Sonic Charmer(tm) model

This is totally rocket-science. I should get a patent or a Field’s medal or something.

5. Some technical details: I broke out the Gaussians as a national error + a state error. [UPDATE: No actually I didn't, see ERRATA below.] I figured, there could be a ‘systematic’ error in all polling, and meanwhile, there could be (independent) errors in each of the states’ polling. The actual error in each state is the sum of the two (national/systematic + state/particular). Each type of error has its own standard deviation, which are ‘parameters’ of my ‘model’. I basically made them up, using a national standard deviation of 3% and a state deviation of 4% (so the total per-state standard deviation is sqrt(3^+4^2) = 5%). Again, this is kind of made-up but you could play around with it. More/greater volatile error in these polling numbers would increase the model’s prediction for Romney (this might be obvious, as he appears to be behind, i.e. out-of-the-money in the electoral college…). I don’t think even the most contrarian poll-doubters think the polling error is more than something like +/- 5 points though.

MY EARTH SHATTERING RESULT: According to this ‘model’, Obama has a 78.3% chance of winning. Hey let’s press F9 again to refresh those random numbers. 77.1%. Once more for good measure? 77.8%. Seems pretty replicatable; however I slice it, a reasonable, easy, dumb model – based on publicly-available state polling data – puts a chance of an Obama victory in the high 70s.

What does Nate Silver’s model say right now? 77.4%.

Well what do you know. Remind me again why we’re all supposed to be so skeptical of Nate Silver’s model? Actually, I’m asking the wrong question here: remind me why anyone’s supposed to even be impressed by it? Just freaking make your own.

The point is, the premise that Nate Silver’s output requires a bunch of skepticism and raised-eyebrows isn’t supportable at all. He is literally doing something more or less mathematically-indistinguishable from mapping state polls + poll error margins -> election outcome in the dumbest, most obvious possible way. So tell me then, why all the griping from the right? Obviously, it’s because they don’t like the output.

But that is dumb.

Now, keep in mind, I still think Romney could win. (I don’t think he will, but he totally could.) And there could certainly be something seriously wrong with all the polls. But if that’s what you think, then your gripe is with the polls, not with whatever Nate Silver’s black box is doing. Because again, whatever that black box is doing, it’s not coming out with a result markedly different from doing the most easy, obvious thing you can possibly do in 10 minutes of Excel work.

ERRATA (11/1): Actually I noticed a short-circuit in my sheet had prevented the systematic/national error bump from feeding anything. So really, in the above, the Sonic Charmer Electoral Model™ assumes no correlated error between state polls, and used a 4% per-state standard dev. This should explain, if you were wondering, and I know you were, why my model was already converging ok with 10,000 sims instead of 10,000 x 10,000 sims (which I think would break Excel if I tried it the way I’m doing). The national systematic-error/correlated component can be added in but would need more work. For future papers!

21 Responses to The 10-minute Nate Silver (or: why the right needs to find another line of attack)

  1. joshua says:

    Spreadsheet, schmeadsheet. I would be more impressed if you made a BAR CHART!!!

  2. Matt says:

    Agreed. Silver is just aggregating poll results, not conducting polls of his own. Romney could certainly win, if for some reason the polls are incorrect and he wins Ohio. And the polls could be incorrect, which is why we have an election rather than just averaging all the polls.

    I think the focus is a reaction to democrats triumphantly pushing Silver’s conclusion because they like it. Silver, and all these polls, should just be reporting the facts but there is a perception that it isn’t that simple, that by reporting polls showing an Obama win they are trying to have some nefarious effect on the election, suppressing turnout (why vote, you’re going to lose) or some other thing.

  3. roystgnr says:

    It’s always so exciting when one first discovers that there’s overwhelming evidence for the proposition “most of the people who disagree with me are idiots”, then so disappointing when one first discovers overwhelming evidence for “most of the people who agree with me are idiots too”, then so depressing when one finally ponders “if I was an idiot myself, how would I be able to tell?”

  4. SkepticalCynical says:

    I agree that Silver isn’t really doing anything particularly controversial, and the right question to ask is whether there are systematic errors in the polling. Any such error is obviously going to have a large impact on your/Silver’s magic spreadsheet.

    But I’m still pretty sympathetic to the reaction here. Silver has a large megaphone at NYT and Smart People like Krugman are quick to declare his magic spreadsheet to be “Science!”. It looks like some pretty desperate clutching of straws, and I totally get why righties want to rub their nose in it a bit.

    For the record, I do believe there have been systematic errors in the polling, and I believe Romney will win Ohio and the election on Tuesday. But such an outcome wouldn’t make Silver’s work any more or less wrong than it is otherwise.

    • As far as I can tell, he ‘has a large megaphone’ only because up until about 4 years ago, most people in the media were too lazy/dumb to build the obvious magic-spreadsheet I just did. And STILL mostly aren’t doing it, leaving it to this one guy that NYT has given a blog, and thinking of his output as something hugely complicated and magical. Which is shocking when you think about it! I at least hope that campaigns were *internally* doing this sort of analysis prior to 2008, and just not broadcasting the fact…but maybe not?

      I think maybe the reason for my frustration with the righty backlash is that I would prefer they focus their imagination on coming up with *better* conspiracy- or other-theories as to why the state polls are all wrong (or at least why key state polls – Ohio, etc are wrong). Because, while I’d be pleased as punch to embrace your view that Romney will win Ohio (thus, the Ohio polls are something like 3% off), I’m *seeing nobody who’s come up with a good reason to believe that*. That makes it hard to sustain the fantasy, and I *really want to*! But instead, everyone’s obsessing over “Nate Silver’s model”. It’s a huge waste of creativity and conspiracy-mindedness!

      • A Lady says:

        did you read the number muncher’s stuff on nat’l review about why the ohio polls were wrong before it went down?

        • I’ve skimmed a bunch of commentary but seen nothing earth shattering.

          But fine, let’s say Ohio polls are wrong. In fact let’s say they’re way wrong. So, I go into my spreadsheet and *artificially* change the Ohio RCP state poll average by, oh I don’t know, 10 points (!).

          When I recalculate, what do I see? Obama is still the favorite, with a high-50s % chance to win.

          Now, I can do a different kind of adjustment – for example, if we believe that *all polls* are systematically underpredicting (R)s by 1.5%, then my model says Romney should be the favorite, with a ~55% chance to win.

          Obviously, everything depends on the state polling data and how reliable it is. My point is not that I believe state polling data. My point is that Nate Silver’s (or, Sonic Charmer’s) ‘model’ *isn’t really adding new information that wasn’t there in the state polling data*. If the state polling data are accurate, Obama is indeed the clear favorite, like Nate Silver says. If they’re not, he’s not, and so the ‘model is wrong’, but only because the *polls are wrong*.

          Now again, it could very well be that the polls are wrong. But the onus is on people who think that to explain and come up with good theories as to why that might be so. To pile on Nate Silver for *not artificially disbelieving polls in the way that you think that they should be disbelieved* really just makes no sense, and that wouldn’t be the purpose of such a model anyway.

          • A Lady says:

            Well, I read Nate back in 2008 when he was just a little pepper and he was doing a lot of recasting and such about bad skews and undersampling and all that. IOW, you’re right that the pile-on is for the wrong reasons. Nate used to do more actual wishcasting and I remember that Nate Silver, who was happy to artificially disbelieve polls then.

            The polling horserace is my current hobby for another few days. It’s all mail-in voting here, so the horserace is the only fun part left since we can’t go and physically vote for Spaceman Spiff anymore (a perennial local favorite).

            I like the conspiracy theories that the polling is wrong as a vote-depressing tactic and/or a bulwark to aid in the margin of cheatins’ after Nov 6th.

            • I like conspiracy theories too..I think I just want conspiracy theories I can believe in :-)

  5. A Lady says:

    And I also wish Clive Barker movies were better. It is weird how crappy they are, even though some of the stories filmed were pretty decent ones.

    • I know huh! Maybe it’s just that they weren’t given big enough budgets…

  7. Ian says:

    I think that a big part of lefty denial of HBD is that they’d rather live in a make-believe world that is as it ideally should be than live in it as it is – it is more pleasant for them to dreamily imagine that, with enough of the right social interventions, all people can live in harmony as brothers, than to wake up and realize that black people in appreciable numbers can and will never constructively participate in modern civilization.

    And I think that a big part of right denial of Nate Silver’s numbers is that hey’d rather live in a make-believe world that is as it ideally should be than live in it as it is – it is more comfortable to dreamily imagine that the more competent, intelligent, worthy, and actually-American candidate has a good chance to win than to wake up and realize that the nation is already lost, and, as an Auster commenter phrased it today, “the human tidal wave of non-white, welfare-dependent immigration which has altered forever the electoral map of America, thus making it possible in the very near future for a Democratic candidate to win with as little as 33% of the white vote.”

  10. Dave says:

    What justification do you have for distributing the error normally around the poll results? I suspect that the tails are much fatter than a regular Gaussian distribution would imply.

    • Ok. What distribution(s) shall I use?

      It’s a cuffed model, and the errors/distributions are made-up. You’re right. Feel free to suggest a refinement. As it happens, I suspect my error bars are actually too big. 4% s.d. means that 5% of the time I am simulating scenarios where a state% differs from polls by 8+% (!). So sometimes in my model, Romney wins Oregon, etc. This high of an s.d., if anything, tilts the model result in Romney’s favor.

      I don’t think it will materially affect the result (for any reasonable choice of error distro), FYI. As long as RCP has Obama leading in Ohio, and a couple other key states, it would be pretty mathematically impossible to construct a symmetric error distribution that *wouldn’t* put Obama% in something like the 60-85% range.

      But, feel free to try, and I’ll plug it in.

      • Dave says:

        I have no idea, but simple assumptions w.r.t. distributions here just seem unjustified. However, I do agree with your point – the smart money has to be on Obama now. Intrade puts him at %66.

        The problem is that the electoral college is extremely sensitive to error in very nonlinear ways. Thinking about the outcome in any particular state is one thing, but trying to tie all the states together in one probability value just seems pointless to me (given all the unjustified assumptions made along the way).

        • They are totally and completely unjustified. I made them up! But what I’m trying to get across is, unless there’s some reason to believe that perfecting the error distribution model will change the results materially – or someone can come along and tell me what the Metaphysically Correct Error Distribution is supposed to be – it’s really just not worth my or anyone’s time to agonize over them.

          Especially not for a 10-minute model! :-)

          Meanwhile, the EC may be sensitive to error in ‘nonlinear’ ways, but it’s still monotone. A fatter error distribution will generate more scenarios in which Romney magically steals California; a tighter one will generate less such scenarios. So it’s a situation similar to ‘volatility’ in option pricing; it’s perfectly true that if I give these polls more volatility, the ‘Romney wins’ option gets more valuable.

          But it doesn’t go above 50%. Not with these RCP numbers.

        • You also hint at a big question mark in any such model, ‘trying to tie all the states together’. This goes to the question: when polls are ‘wrong’, are those errors independent or linked? This is analogous to the ‘correlation’ used in CDO pricing, and indeed, my model is a poor-man’s version since it has a ‘correlation’ baked into it (through the national-systematic-error). I believe that with different ‘correlations’ I could get my model to spit out Obama% in the 65-85% range, so there is some dependence.

          But once again, there’s no reasonable way to get Obama under 50% with this lay of the land. So I didn’t really sweat it.

