Filed under: Uncategorized
The righty backlash against Nate Silverization seems to have put a bee in my bonnet. I mean, how low have things sunk that I’m tempted to link to an Ezra Klein column for making good arguments against such model-skepticism? Ezra Klein, people. So I decided to devote a whopping 10 minutes or so to showing why the righty criticism of all things 538 is so silly.
To listen to the right talk, the output of Nate Silver’s “538” (and other similar efforts) is just so counterintuitive and controversial – diagnoses often reached after looking at national popular-vote polling data (as if that means anything for a Presidential election), or thinking about aphorisms such as ‘a bad economy is bad for the incumbent!’ – that it requires either a big conspiracy theory, or (at best) illustrates that the Nate Silver Generation is totally putting themselves out on a limb and staking their somehow precarious reputations and complicated/opaque ‘models’ on the coming election, which will either confirm or (the idea/hope seems to be) decisively-refute Nate Silverness once and for all.
I disagree, and here’s why: I can basically replicate Nate Silver’s results (more or less) in 10 minutes of Excel hackery. I know because I just did.
And please keep in mind that I am a rabid, conspiratorial-minded righthy and I totally want Romney to win. There is absolutely no room for confusion here about my biases: my clear, unmistakable bias is for finding reasons to doubt, discredit, and demonize Nate Silver. I’d love that!
But here’s the thing: those reasons are just not there in widely-available public polling data. And what righties need to understand is that’s all Nate Silver is really doing: taking polling data, throwing them in a big bingo machine, and seeing whose ball comes out as the winner.
So what did I do? Here are the ultra-advanced, super-secret, oh-so-quanty steps I took. You may want to take notes because I am ‘opening the kimono’ on my proprietary ‘electoral model’, and I’m not even charging you for it!
1. Use a web-browser to go to RealClearPolitics.com. They have two basic types of data I used: (a) a tally of electors that are ‘solid’ for either candidate, and (b) for each of those that aren’t ‘solid’, an average of state polling data. So, for example, right now it says that Obama has 142 in SOLID STATES, while Romney has 127. Similarly, it says that the average of whatever Ohio state polls indicates that Obama leads Romney 49-46.6% in that state. In passing let’s note that nobody thinks this site has a lefty bias, if anything, it is righty-biased.
2. Now, when I added up electors the SOLIDS plus the toss-up/swing states didn’t add up 538. I realized that the RCP isn’t tallying South Carolina or South Dakota either as solids nor showing them in the table of state averages. They seem pretty red so I just gave them to Romney as solids. So in fact my ‘model’ assumes that the solid tally is Obama 142, Romney 140.
3. There’s some funkiness with Maine and Nebraska’s electors – it looks like they each allocate an elector by congressional district or something equivalently unAmerican and communist – anyway I just attached that ‘extra elector’ to the state overall. So in my model, Nebraska has 5 electors not 4. Whatever. Once I did this the elector tally checked out to be 538, so I had all electors accounted for. Obviously I could deal with Maine/Nebraska’s weird extra elector more carefully in future work if I felt like. Similarly, if there are states that aren’t winner-take-all (I forget), I’m pretending they are. Doesn’t seem like a huge deal though.
4. Anyway once I had the solids (142-140), and the rest of the states laid out by poll data, the rest of the exercise is just to simulate a bunch (i.e. 10,000+) of elections and see who wins. In the words of one of my favorite grad school math professors, here I did the ‘dumbest possible thing': to account for error in all these state poll numbers, for each state I just bumped the lead by a little bell-curve. I made these bell-curves in the most inefficient, dumb way (using Excel’s normsinv(rand())). So, say Obama is leading 49-46, so the polls show Obama+3. Then I draw a Gaussian random number; say it comes out -3.5. This means (in this scenario) the ‘real’ lead is not Obama+3 it is Romney+0.5, which means Romney will win the state. I did nothing special for turnout, for dependence on the economy, or any of that.
Anyway, do the above independently for every state, add up the Obama elector tally, see if it gets to 270 (I have read that Romney wins in a 269 tie), if it does the winner is Obama, if not, Romney. Count up Obama winners vs. total simulations (each laid out as rows in a spreadsheet), and that’s an Obama victory %. SERIOUSLY, THIS IS THE DUMBEST MONTE CARLO METHOD POSSIBLE. But easy as hell. I mean, just take a gander – you could totally probably do this too:
5. Some technical details: I broke out the Gaussians as a national error + a state error. [UPDATE: No actually I didn’t, see ERRATA below.] I figured, there could be a ‘systematic’ error in all polling, and meanwhile, there could be (independent) errors in each of the states’ polling. The actual error in each state is the sum of the two (national/systematic + state/particular). Each type of error has its own standard deviation, which are ‘parameters’ of my ‘model’. I basically made them up, using a national standard deviation of 3% and a state deviation of 4% (so the total per-state standard deviation is sqrt(3^+4^2) = 5%). Again, this is kind of made-up but you could play around with it. More/greater volatile error in these polling numbers would increase the model’s prediction for Romney (this might be obvious, as he appears to be behind, i.e. out-of-the-money in the electoral college…). I don’t think even the most contrarian poll-doubters think the polling error is more than something like +/- 5 points though.
MY EARTH SHATTERING RESULT: According to this ‘model’, Obama has a 78.3% chance of winning. Hey let’s press F9 again to refresh those random numbers. 77.1%. Once more for good measure? 77.8%. Seems pretty replicatable; however I slice it, a reasonable, easy, dumb model – based on publicly-available state polling data – puts a chance of an Obama victory in the high 70s.
What does Nate Silver’s model say right now? 77.4%.
Well what do you know. Remind me again why we’re all supposed to be so skeptical of Nate Silver’s model? Actually, I’m asking the wrong question here: remind me why anyone’s supposed to even be impressed by it? Just freaking make your own.
The point is, the premise that Nate Silver’s output requires a bunch of skepticism and raised-eyebrows isn’t supportable at all. He is literally doing something more or less mathematically-indistinguishable from mapping state polls + poll error margins -> election outcome in the dumbest, most obvious possible way. So tell me then, why all the griping from the right? Obviously, it’s because they don’t like the output.
But that is dumb.
Now, keep in mind, I still think Romney could win. (I don’t think he will, but he totally could.) And there could certainly be something seriously wrong with all the polls. But if that’s what you think, then your gripe is with the polls, not with whatever Nate Silver’s black box is doing. Because again, whatever that black box is doing, it’s not coming out with a result markedly different from doing the most easy, obvious thing you can possibly do in 10 minutes of Excel work.
ERRATA (11/1): Actually I noticed a short-circuit in my sheet had prevented the systematic/national error bump from feeding anything. So really, in the above, the Sonic Charmer Electoral Model™ assumes no correlated error between state polls, and used a 4% per-state standard dev. This should explain, if you were wondering, and I know you were, why my model was already converging ok with 10,000 sims instead of 10,000 x 10,000 sims (which I think would break Excel if I tried it the way I’m doing). The national systematic-error/correlated component can be added in but would need more work. For future papers!
21 Comments so far
Leave a comment