Tuesday, September 07, 2004

Strange statistical analysis from Rasmussen
(Via the daily Kos). I commented on this sort of analysis when Joshua Marshall reported that pollsters do it. Now there is an authentic pollster doing it in public.

Rasmussen has to deal with the fact that his latest two daily reports of 3 day moving average show a much closer race than Time and Newsweek. Partly he concedes that Rasmussen results from Saturday might be funny and proposes ignoring them. Rasmussen certainly doesn't even hint that he knows of a problem in sampling he just writes

"Our current poll (showing the President ahead by just over a point) includes a Saturday sample that is way out of synch with all the days before it and with the Sunday data that followed. In fact, Saturday's one-day sample showed a big day for Kerry while all the days surrounding it showed a decent lead for the President.

It seems likely that Saturday reflects a rogue sample (especially since it was over a holiday weekend). But, it remains in our 3-day rolling average for one more day (Tuesday's report). If we drop the Saturday sample from our data, Bush is currently ahead by about 4 percentage points in the Rasmussen Reports Tracking Poll. "

This is not a sound argument. Rasmussen seems to assert Saturday differs from other polls because of sampling error which happened to be large that day. Another possibility is that his employees played a little joke and just made up the numbers. If Rasmussen places a non-trivial subjective probability on the second explanation, he should do something other than ignore the results from that day. I'm sure he considers such a possibility vanishingly unlikely and I agree with him. A statistically significant difference between valid polls does happen for one of twenty pairs of valid polls, that is, it happens often.

Assuming Rasmussen doesn't doubt that the Saturday poll really was conducted according to standard Rasmussen methodology, it is not a good idea to drop it. The proposed procedure appears to be to average three daily polls unless one day's results is significantly different from the results on the other two days. Given the fact that sampling error is almost exactly normally distributed, this procedure would give a worse distribution of estimates (mean preserving spread) than the standard procedure of averaging over three days even if one day stands . This is well known, and was proven decades ago two different way one by Cramer and Rao and by Rao and Blackwell. Surely Rasmussen knows this but finds the desire to reassure readers more pressing than the fear of Rao turning over in his grave.

Now Rasmussen might be considering Thursday, Friday, Sunday and Monday when deciding that Saturday should be tossed. That is days outside of the three day moving average are not reported in the average but are used to decide to remove a point. I see no logic for this approach. One could also report weekly averages and toss data if they are significantly far from the neighboring days (the opposite of what Rasmussen does).

Rasmussen's comment on Time and Newsweek is more interesting

" Today, it seems likely that Time and Newsweek included too many Republicans.
Time reports that Republicans will vote for Bush by an 89% to 9% margin; Democrats for Kerry by an 80% to 9% margin; and, unaffiliated voters for Bush 43% to 39%.

Four years ago, 35% of voters were Republicans, 39% were Democrats, and the rest were unaffiliated. If you apply those percentages to the Time internals, you find Bush up by about 3 percentage points. If you do the same with the Newsweek internal numbers, you find Bush with a six point lead. Those results are very close to the Rasmussen Reports data (excluding the Saturday sample). "

This is interesting. There is a reasonable argument for this procedure. The idea is that party affiliation changes slowly so much of the poll to poll variation is sampling error. It seems extreme to use 4 year old data instead of the frequencies party affiliation in a pooled sample of a lot of recent polls. More to the point, even if party affiliation generally changes slowly, there was just a party convention. This might have caused a negative Democratic affiliation bounce (I personally don't like sharing a party with Zell Miller but will stick with it). The assumption that party affiliation changes little and measures sampling error is least defensible when trying to estimate a convention bounce.

I would like to add that one does not have to choose between raw and party affiliation normed numbers (as calculated by Rasmussen). One could use a weighted average of the two. I think there is a reasonable way to choose the weights. Different polls taken around the same time measure similar realities with completely independent sampling error. Changes in candidate support are small compared to changes in measurement error. This is why the average of several polls is a better estimator than the best of the polls. This means that the best degree of party norming could be estimated by a regression in which one attempts to predict the results of different polls taken at about the same time.

Again, this is least realiable around a convention. I would guess that a few hours during a convention move opinions more than a week of events in April. The approach is based on the (false) working assumption that a day is a day is a day.



No comments:

Post a Comment