Poll data access is stone-age
November 4, 2012, 7:10 pm
Filed under: Uncategorized

Given the importance of state polls in fiddling around with modeling the electoral-vote outcome, I’ve noticed a somewhat shocking gap in news coverage:

It’s kinda hard to get actual state poll data by pollster. You know, something HIGHLY ADVANCED like

but with all the most up-to-date numbers filled in.

You can go to RealClearPolitics and get their ‘averaged polls’. I don’t know which pollsters they average. I don’t know when/how often they update the averages. And they seem to pick and choose which states they feel like showing and which they don’t bother showing because they’re ‘safe’.

You can go to Rasmussen and hover your mouse over a map of the US. If you’re careful and have good hand-eye coordination, some numbers will pop up. To one digit (like “Romney 50 – Obama 47″). I guess you have to do this 51 times to populate the whole table manually. (?!)

I assume you could go to other pollsters one by one and do something similar. Scrape out data manually, often state by state (each state on its own separate page for your added convenience!).

But where can you go just to freaking get all the raw data at once?

You can read news articles with 7 paragraphs of fluff and two quotes, whose actual content is basically just ‘Gallup, updated poll, state XX, Obama Y% Romney Z%’. The actual numbers being reported in paragraph 5.

You can go to a major commercial news site and they might have a map with flash animation and some graphs. They might arrange states into ‘safe’, ‘leans’, ‘tossups’ based on averaging whichever polls they’ve decided to average, and show a big red/pink and blue/light blue bar fighting each other.

But where’s just the raw data?

If election reporting weren’t in the stone age, every major news site would have a page, just one click away from their front page, showing nothing but the table above, with numbers filled in. Yes, there probably needs to be an extra column (or hover-over data) per cell to show the Date Of Last Update, since these pollsters come out with their polls at all different times and these numbers can quickly become stale.

But c’mon. It’s really not that hard. I guess if this were something really important like some obscure stock’s intraday quotes, it would all be available, with add-ins and APIs so you could pull in realtime data effortlessly. But, this stuff isn’t important; it’s just a Presidential election. Whatever, right? So, like, RealClearPolitics averages it is.

If I had easy access to this data, after a couple minutes of number-crunching I could fill out the rest of this table for you, to finish up my Sonic Charmer Electoral Model™ reporting & close the chapter on my Silverbating™ odyssey:

This would even help prove a point, that to some extent there are duelling-pollsters in this election, and they are actually predicting a wider range of outcomes than mainstream reporting might lead one to think. Again, I’m not sure it’s fully been grasped by anyone that as of this moment Rasmussen is implicitly predicting a Romney victory. This also illustrates that ‘averaging’ a bunch of state polls might lead to somewhat unstable results.

But, I guess I can’t do any of that, since the data is so annoyingly hard to locate. Unless someone knows of an obvious site I just missed?

UPDATE: As I said in comments, this whine-bleg post metaphysically helped me find this site, which does have raw data. Took a look but it’s not that usable for what I wanted however; it turns out state polling data is by and large pretty sparse. Rasmussen has done the most polling but I already have their #s; I keep reading that Gallup is ‘respected’ but Gallup is not in there; etc.

The other interesting thing I noticed: that site is pathologically anti-Rasmussen. They have a whole section devoted to summarizing polls ‘Rasmussen-free’. Sure, I had known that Rasmussen was considered to be the ‘conservative’ pollster but hadn’t quite realized the depth of venom. What is their explanation (yes they have a whole page devoted to explaining) why Rasmussen is not to be trusted? Well first, Nate Silver did some sort of analysis of their 2008 numbers because they were off. YAWN. You mean 2008 was unique? Can’t imagine how. Second, they don’t like Rasmussen’s calling method:

Just to look at one methodological issue, if no one answers the phone, Rasmussen picks a different random phone number instead of calling back two, three, four or more times as other pollsters do. Why does this matter? Because 20-somethings (who skew Democratic) are often out, whereas 60-somethings (who skew Republican) are often in. By not being persistent in finally getting through to a randomly chosen phone number, the sample is inherently biased towards Republicans because they are easier to reach.

Oh my.

Talk about a just-so story! And look, I’ve spent the past 5 days castigating righties for making up a bunch of just-so stories, so I think I’ve fully earned the right to say this: this complaint is dumb.

What is’s evidence that the stats are being skewed by the people who get robocalled being ‘out’ if they’re twentysomethings but ‘in’ if they’re 60-year-olds? Yeah, that’s what I thought. This explanation is no better than the righty explanation that pollsters are reaching more (D) likely voters this time because lefties lie about being likely, because they’re welfare queens who answer their phone more, etc. In other words, it’s a made-up rationalization to reach a desired result. You’re welcome to it, but don’t kid yourself.

It’s just refreshing to have a reminder that it’s not only wishful-thinking righties doing this sort of thing.

6 Comments so far
Leave a comment

hmmm. google “rasmussen state by state presidential poll” gives

apparently it’s yours for just $3.95 p/month.

Comment by Anon.

I don’t wanna pay! Thought that would have gone without saying. Also, that’s just one pollster.

When WSJ gives me my own BLOG about the Sonic Charmer Electoral Model, they’ll budget it. Till then..

Anyway, I did just find this, looking at it now:

It’s an ironclad rule of blogging, if you blog “Why isn’t there X?” (in a whiny enough voice) you’ll find X yourself within 10 minutes.

Comment by Sonic Charmer

You don’t know where the RCP averages come from?

Click on any link in the first column and you get a full page of all the polls for that state with the “current” averaged ones highlighted. Example:

Comment by joshua

Got it. Yeah, I never clicked that far into it :)

That’s still a state-by-state process…infeasible to do that 51x…

Comment by Sonic Charmer

Woah you blog a lot.

My point from the earlier comment was not to say that I think statistics are inapplicable to the election.

To state my objection simply, I think publishing any one probability figure is lying.

Unlike flipping a theoretical coin, where the model and data are perfectly known, in this case both the model and the data are likely in error in unknown ways.

That is my objection to Nate – not that he uses statistics, but the implication of publishing one probability value implies that his model is perfect.

Of course, in his blog post he says that the remaining error is due to the probability his model is in error. How can you even estimate that? A coin can be flipped for centuries and yield the same distribution. There is no reason whatsoever to think that elections are like that, given that nearly all the circumstances change between them.

Comment by Dave

The model and data are definitely in error in unknown ways. That is the lay of the land anyone looking at these elections must deal with. Nevertheless, he has a model (so do I), it has some assumptions, and it creates a reasonable output given the data. I don’t see how reporting that output is ‘lying’. And it does NOT ‘imply that his model is perfect’. I guarantee you he would say that his is just a model and it’s an (imperfect) attempt to estimate likelihoods.

You are free to question such a model and say that you think it is significantly flawed in some way. But I don’t see why you’d do that, nor do I see you stating any specific tangible objection. As you’ve seen from all my blogging, the output of Silver’s model is broadly reproducible and just not really out of line with other models, from the dumb (mine) to the perhaps less-dumb. There is just nothing anyone can point to about Silver’s model’s output that screams out ‘this is seriously wrong and needs questioning’. It’s actually so easily-verifiably not seriously wrong that it’s pretty boring and unimpressive.

You are right that elections are not like coins in that we can’t ‘flip’ them 1000000 times and confirm an assumed probability distribution. I think you are wrong that this means elections are not analyzable via standard statistical and probabilistic techniques. The real weak link in all of this (if there is one) is *not* the ‘model’, it is the polls. If the polls are wrong/systematically biased in some direction, then all bets are off.

But that goes without saying. Everyone acknowledges that. Nate Silver would be the first to say so. Indeed, he did say so on his blog.

Comment by Sonic Charmer

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 496 other followers

%d bloggers like this: