Sunday, October 28, 2012

Polling Aggregator Obsession

The 2012 Presidential is so close just so close that not only do different polls show different results but different averages of polls.

OK it is clear that extremely fancy models such as the fivethirtyeight model which attempt to assign undecideds and use state data to estimate the national vote and vice versa are different (generally I don't        
like fancy but Nate Silver has a track record).  But simple smoothers are also different.

Partly it is the choice of polls to include yes no on web based or partisan polls.  www.realclearpolitics.com is a simple average but they don't include partisan polls or web based polls (and use self reported partisanship so Rasmussen is included but not PPP ooops).

But part of it is the smoothing algorithm.  both www.talkingpointsmemo and http://elections.huffingtonpost.com/pollster use loess smoothers which fit a constant and a coefficient on time using weights which decline away from time t and report the fitted value for t.  I think this means they are too quick to extrapolate trends.  This means that one outlier can convince the computer that the polls have been trending for a candidate.

My example the odd case of TalkingPointsMemo and Michigan.  The program has Ohio leans Obama and Michigan a tossup.  This is due to one poll by Foster-McCallum (in serious contention for the bitterly fought prize for worst pollster).  Without them, the computer estimates that Obama is 3.9% ahead.  That is enough for TPM to call the state leans Obama and give more than 269 electoral votes in Obama leaning states.

Here is the graph with Foster-McCallum




Here is the graph without them



This is extreme, because Foster/McCallum had a Florida poll with an absurd sample which earned the coveted and very rare double asterix for not included in the average because of an editorial decision.

I think such decisions should be made pollster by pollster not poll by poll.  But totally aside from that a sensible smoothing algorithm shouldn't be so sensitive to one poll when 10 October polls are available all of which polled after the first TV debate.

Notice how the poll pulls down estimates of the state of the race before it's sample began.  This feature of the smoother makes it hard to see if a shift is due to an event.

I say even simpler is better than Loess.

No comments: