Based on his expertise as a baseball handicapper, Nate Silver is trying to explain to pollsters how to do their job. He is awesome.
One of my rare disagreements with Josh Marshall came when he decided to follow the pundit tradition of excluding outliers before averaging (he's over it TPM now averages tracking polls which is arbitrary but simple). Silver is actually coming close to convincing me that dumping outliers often makes sense.
My position was that, unless one has reason to suspect the methodology of a poll, the best estimate of true support in a population is a sample size weighted average of different polls. It is a mathematical result that, if the polls differ only due to random sampling error, then such an average is a better estimate of population voting intentions than a trimmed average.
In particular, it is almost impossible to tell a story for why it is reasonable to drop a poll because the result is extreme and not drop a pollster because there is reason to suspect their methodology.
Polls whose results are very different from the average poll might be worthy of special methodological scrutiny, but it is a bad idea to just drop them. The fact that this is a pundit tradition is further evidence that it doesn't work.
Now I would say drop an outlier if and only if Silver comes up with an objection to the pollster's methodology. The problem is that he always can. I fear that my current approach implies dropping all polls from any pollster who reports one extreme result (Silver is very firm the doubt is about a pollster not a single poll by that pollster and I totally agree). Worse he seems to find fault with polls that are surprisingly good for McCain and, of course, he personally supports Obama.
Still he is very convincing. In particular a key fact about this election is support is very different by age with young people strongly for Obama. This creates huge challenges for pollsters (and the Obama campaign) since many young people who are registered to vote don't vote. This must be part of the reason for Obama's immense emphasis on the ground game, although another part is his background as a community organizer, his sense that high turnout is not just good for his chances but good for democracy and the fact that he has money coming out of his ears (and look at those ears).
Hence problems. Gallup has repeatedly told people to ignore results with their traditional likely voter model which doesn't work in August, then when it stubbornly gave weird results in September introduced an alternative likely voter model.
It seems that pollsters get in trouble with Nate the Poll-Fascist (I take some words seriously) because they have trouble contacting young voters, then their likely voter filter excludes most of them.
On day 11 of the poll the IBD-TIPP tracking poll had McCain way ahead among young voters 74% to 22%. This uhm differs from the results of other polls to an extent which can't be explained by sampling error unless the IBD-TIPP sample of young voters was tiny. Silver's theory (and I can't think of another possibility) is that, after filtering, IBD_TIPP had a tiny sample of young voters each of which then had a huge weight in the headline over all average as they weighted them up to match the fraction of actual voters in past elections who were young.
This might mean that the young likely voter sample is tiny and chance did the rest. It also might mean that for a young person to get counted as a likely voter, the young person has to be very odd, for example, lying about their age and actually not so young (hey I do that some times) or for another very rich or with lots of children already or something.
Now Silver also came down like lead on a polltser who didn't weight by age and so had a final average that would be valid only if the turnout of young voters will be much lower than it was in 2004 (can't find the link he's posted so much since then).
Now I think one problem with likely voter filters is an absurd step in the methodology. I know Gallup does this (they explain a lot) and I assume others do to. The traditional Gallup filter estimates the probability of voting based on answers to 7 questions (including one which is similar to "have you voted before ?" which is clearly biased against the young). Then they guess the turnout percentage at x, then they count as likely voters the people who have a probability of voting greater than x. This makes no sense at all. It consists of rounding estimated probabilities to 0 or 1. They toss out responses, then multiply the remaining responses by weights to match actual voting by characteristics. This is crazy. There is no way it could be optimal. Even if the probability that someone actually votes is low, some information is contained in their statement about for whom they would vote if they voted.
It would make more sense to estimate the likely vote by multiplying responses by the estimated probability of voting (an aside moved down here *). If the demographic characteristics of interest (age, race, gender, marital status, income) are included in the qualitative response model which was fit to past voting, this would automatically imply no further weighting is required to match the past election turnout data. Taking lots of numbers and multiplying by numbers between 0 and 1 which add up to N implies less sampling error than averaging just N them (if numbers between zero to one sum to N, their squares sum to less than N).
The only problem is that the estimate of the likely vote is not an average over "likely voters". So ? Already poll results aren't fractions among a sample of likely voters as likely voters are weighted to match recorded voting behavior in past elections. It does mean that the likely voter sample size is not a whole number (again true with weights). It is still possible to calculate standard errors due to random sampling (which will be smaller than with the traditional approach). I think it is honest to back out an equivalent likely vote sample size from those standard errors (so long as the pollster explains what they are doing).
This by Silver is very good, but I just found it when looking for the link immediately above so I don't have anything to add to it.
* One could also get estimates for different turnouts by fiddling with the constant term in the voting probability estimate (although this is only optimal if you assume that disturbances to latent variables must be normal or extreme value and that assumption is demonstrably false as shown by Jonathan Nagler et al). In this case, thee assumption is that shifts in enthusiasm affect everyone equally and that the distribution of the unobservable (desire to vote minus desire to do something else with ones time) which implies voting if it is greater than 0 is the same as the assumed distribution for a probit (normal) or a logit (extreme value). Such an approach can't handle the idea that there will be increased turnout due to something specific to young people or African Americans which, you know, is plausible this time. I'd leave "turnout might be different this time" as a warning, and not pretend one can quantify it. Or report results with various turnout scenarios.