One of the odder aspects of this pre-election polling season has been the mystery of the Party-ID. As I have only learned this year, current Best-Practices Polling Methodology apparently involves (a) polling people, (b) asking what party they’re in, and then (c) re-weighting the answers from (a) according to what the pollster believes the ‘real’ party makeup of the electorate (as opposed to those he polled) is.
[SLIGHTLY DEVASTATING UPDATE: After actually reading up on poll methodology some, rather yhan just (R) complaints about it, I'm not at all sure the above is the case. It seems like what most polls do, rather, is build a demographic model of what they think the electorate is like, and try to sample so that the demographics match that model? If the resulting sample ends up D+10 that's how the cookie crumbles. In a way this is still weird but differently so, and still amounts to baking a fudge factor into their polling - but depending on whether you buy that you may or may not consider the rest of the post, which I'm leaving intact, even more BS than my usual.]
[UPDATE 2: Just for completeness, here's one of the articles that made the rounds claiming explicitly that pollsters weight results to match a turnout model, and here's one saying no they don't, that's only the cheap ones, the good ones weight results to match a demographic model. I just. I don't know what to think now.]
Importantly, the ‘real’ party-makeup of the electorate is gauged by looking at the results of prior elections.
Now, I am not a statistician or pollster but this is truly weird to me. Naively, it is not how I would have ever designed a poll from scratch. It seems to insert an extra moving-part, an extra thing that you have to model, into your statistics for no good reason. Instead of estimating f(x) they have decided that f(x)=g(h(x)): and now they have to estimate both g(*) and h(*). Needless to say, errors can pile up quickly when you do this sort of thing. (Why stop at party-ID for example? Why not also ask whether the people you polled are left-handed or right-handed, and then ‘correct’ for the handedness-breakdown not matching that of the country? There’s virtually no end to the ‘corrections’ one could come up with.)
And this is not a small effect either. It is now routine to hear claims, from conservatives desperately seeking some cause for comfort (and also from Borepatch), that due to the Party-ID Problem, all the polls that show Obama 3-5 points up are ‘really’ showing Romney 3-5 points up, if you re-weight the party ID properly (or undo the pollsters’ weighting). In other words, this fuzzy Party-ID re-weighting procedure, whatever it is, seems to be be capable of generating not 1- or 2-point swings as one might have imagined to be tolerable, but for 8- or 10-point swings.
That is insane. Especially considering that all these polls advertise having a ‘margin of error’ of 3 points or some such. When the Party-ID correction can cause such a swing you may as well throw the actual poll out the window because it is just some random noise you are throwing on top of a model of Party-ID that you have built. And when we hear about these ‘poll results’, what we are really being shown is more or less just the result of them hitting F9 on their dynamic Party-ID model. I have never, in any prior election season, heard of such a big effect stemming solely from the Party-ID correction. So what is going on here? Is there a method to this madness? I think there is.
Conservatives of course think it traces to bias. The pollsters and media are all in the tank for Obama, and want to demoralize conservative voters, so they are cooking the books. One problem with this theory is that (even if that’s what the biased pollsters/media are trying to do) who’s to say the outcome would be ‘demoralize’ rather than ‘fire up’? The bigger problem with this theory though is that it sounds insane.
But the problem with just saying the theory is insane is that modern-day poll methodology virtually asks for such a theory. Because (unless I am misinterpreting) it really does seem to be the case that all these pollsters use models which contain an arbitrary, internal dial – how/whether to re-weight for Party-ID – that as far as I have been able to tell, pollsters can tweak however they like and whose details they don’t share, or perhaps even consider proprietary. If your polling methodology amounts to: ‘we poll a small group of people, and then multiply the results by a secret factor whose provenance we respectfully decline to explain’, all I can say is don’t come complaining to me with ‘they’re attacking SCIENCE!’ when people accuse you of cooking the results, because I’m not going to be very sympathetic.
But I don’t for one second believe all the pollsters are ‘biased’ nor do I think they are stupid, either. If they were this would have led to severe poll vs. election mismatches that would have come out in prior elections. I do recall some noise that Bush v. Gore deviated from pre-election polls, but nothing like the Party-ID effect we’re seeing here. We would have seen a much bigger effect than that if the Party-ID correction were such a problem. So in fact, when pollsters say (as I’m sure they would) that they need the party-ID correction to get their poll predictions to match the outcome, I guess I’m inclined to believe them. There may be some amount of overcorrection going on right now due to over-fitting to the 2008 election, which was an outlier (lots of people expressing disgust for Bush, (R) voters suppressed, lots of voters voting for Obama so they could prove how non-racist they were, etc.) But I doubt it’s anything so dramatic as could turn an Obama 3-5 point “polling” (i.e. polling model) advantage into a Romney 5+ point victory.
After all, keep in mind, I do still expect Obama to win in November. After the votes are counted, I mean.
So (if I’m right) we seem to have a situation where every single pollster, whenever they do a poll, needs to ‘correct’ all their (D) results upward and their (R) results downward by, let’s say, 5% each. They all think they need to do this because the (D) vs. (R) breakdown they observe in their polls doesn’t match what they saw in 2008/10 and (therefore) what they expect to see in the election. And let’s ditch the insane-sounding theory that they’re all doing this intentionally because they’re in the tank for Obama. This gives rise to a mystery.
To spell out the mystery, this would mean that when pollsters randomly sample ‘likely voters’, they consistently get one sort of (D)/(R) breakdown, but whenever elections are actually held – and presumably in November this will happen too – they get a significantly different (D)/(R) breakdown. Every time. So predictably, and so significant, that they need to include it as an explicit correction in their model. And again, let’s assume they’re not doing this just because they’re biased; they really are trying to construct the most accurate polls they can.
But why, and how, could there be such a consistent and persistent mismatch between every random sample of LVs in every poll and the likely party-ID makeup of the electorate as counted after the election?
I think it can only mean that the pollsters have refined their polls so effectively that they are picking up the effect of election cheating.
Think about it. If all your random unbiased samples of likely voters come out 50-45 in favor of the (R), but then the ‘election turnout’ (i.e., after the votes are ‘counted’…) ends up being so different from your samples that the (D) wins 45-50, i.e. a 10-point swing, the most likely explanation is the very simple and straightforward one that the election was stolen via cheating in the vote-counting process. Somewhere between likely voters becoming actual voters, and those votes being counted, surely at least some cheating – ballot-box stuffing, ‘lost’ ballots, votes misregistered, etc. – took place. That cheating more or less causes an 8-10 point swing in party-ID each election (i.e. between actual voters and the results after they’ve been ‘counted’), and this is what we’re seeing all modern polls having to ‘correct’ for. Simple.
These polling models are picking up (albeit inadvertently) the effect of this cheating via the fact that their polled samples never seem to match their predicted election results when it comes to party breakdown. But of course they don’t match. The election results come from ‘vote counting’, and cheating takes place when votes are counted. Now before you protest, yes surely there is cheating on both sides, but nowadays and in the current environment it would appear the (D)s effectively have perhaps a baked-in 8-10 point advantage in the cheating arms-race and so all election-outcomes (vs. unadjusted polls) have to reflect that. In particular, the party-ID breakdown you see when you look at election results reflects this cheating asymmetry. And as it turns out, (D)s just plain cheat better than (R)s right now. This could change at any time of course, and then we would see the Party-ID correction break the other way.
So it’s really very simple: there is no giant conspiracy among pollsters and media, and ‘bias’ isn’t unduly affecting their results. They are, in the main, conscientiously trying to build and update models of the election outcome that are as accurate as possible. It’s just that the election outcome will have cheating in it while all these polls don’t, not really (what would be the point?). So pollsters have to correct for that to be accurate. What they are calling a Party-ID correction would probably be more accurately termed a ‘election-cheating fudge factor’, but we can forgive them this polite terminology.
The bottom line is, conservatives aren’t wrong to think it ‘looks weird’ that all polls need to be corrected D+5 or D+10: they are just misdiagonising and overthinking the reason. The reason is just election cheating that tends to give (D)s a 5-10 extra points or so. Whenever you see a poll with that correction in it, just mentally treat that correction as the poll’s implicit prediction of how much one party will successfully cheat by. Nowadays polls appear to predict a 5-10 point advantage in cheating for the (D)s. Once you understand that everything then becomes clear and simple.
So, now you can sleep better at night knowing that. You’re welcome.