Robert's Stochastic thoughts

Wednesday, September 22, 2004

ARG reports polls for 50 states and DC
Bottom line it's close. The American Research Group (ARG) calculates a national average weighting by state population and gets
Bush 47% Kerry 46%. This is interesting becasue their polls were all taken after the RNC and their sample of 30,600 is larger than that of all the national polls put together (less than 20,00). ARG does not report a standard error for their national estimate, because it would be embarassingly low.

It is not quite as low as it would be with a national sample of 30,600, since the 600 in, say, California each get a weight much greater than 1/30,600. I calculate the sampling standard error of as roughly 0.4% . This means that a 95 % interval for Bush is roughly 46.2%-47.8% and that the confidence interval for Bush- Kerry is roughly -0.6% - 2.6%. The number which corresponds to the +/- reported by pollsters is about 0.8%. I did the calculations with pencil and paper, so don't trust them.

Another way of putting it is that the ARG result counts like 25 national polls each with a sample of 600 or about 15 national polls with normal sample size each with Bush ahead by 1%.

If you decide to trust all polling agencies equally, the thing to do is to average weighted by the inverse of the square of the reported standard error. This means that estimates of Bush's lead in September would be roughly halved by the ARG result ! From 3 - 5 % to 2 - 3%. If you have decided to ignore Gallup, CBS and Time, you have to decide whether to ignore ARG too.

Why did I say embarassingly low ?
I think pollsters use small samples only partly to save money, and also to give themselves an excuse if their numbers are off. With a huge sample, a difference between the poll and the election would imply a more worrisome problem, either a biased sample, a faulty likely voter filter or a psychological difference between talking to a pollster and actually voting. It is clear that some or all sampling techniques give biased samples, because the spread of polls is to large to explain with sampling error alone. Polling agencies certainly don't want to spend money to prove that they are one of the agencies with a defective sampling technique.

In any case, my reaction is anything but ARG !

Update: I have managed to read the census estimates of populations for July 2003 (most recent on their website) into excel and can report that excel and I have roughly the same views about arithmetic. The +/- as reported by pollsters would be 0.868% (not wanting to square the fraction of people in all states + DC, I had rounded the effect of sampling error in, say, wyoming down to zero).

BTW The MOE as reported by pollsters is about one standard error of the difference Bush-Kerry. A 95% interval would be twice as big. They give 95% intervals for Bush support and for Kerry support the standard error of the difference is almost exactly twice the standard error for each candidate as saying one will vote for bush and saying one will vote for Kerry has correlation close to -1 (would be -1 except for the undecided and the Nader supporters etc)

2 comments:

Swami said...: Something maybe some of the people here can help me understand. There are a lot of people who don't answer their phone; people who now don't have wireline phones at all; people who work at odd hours and aren't home to answer the phone, etc. My understanding is that telephone polls pick people by randomly selecting telephone numbers, and then by randomly selecting people within the household that answers. But given all the trends I mentioned above, it seems hard to believe that polls don't have systematic bias in their samples. Does anyone have any insight on this?

Swami; 8:07 PM
Tom Bozzo said...: I saw a methodology note on the cord-cutter issue somewhere, and the argument (right or wrong, I'm just reporting) was that there's always been an issue of people without phones, and the cord-cutters are viewed as just making the same old problem worse (slightly, for now). Gallup claims (in a Gallup blog response) to poststratify (weight) their results to deal with harder-to-reach demographic groups.

Response rates for phone surveys are low and have been dropping, but for nonresponse to lead to bias, you have to make an argument like cw's that there are some systematic differences between respondents and nonrespondents. The response rates are low enough that the issue can't be ignored a priori, though

Anyhow, a sort-of amusing thing is that a Wisconsin state poll was released today showing a huge Bush lead but, as it turns out, with the sort of funky internals that have been firing everyone up (seemingly excess Republican identification, loony 2000 vote responses). Both campaigns and the local press were discounting it as an outlier long before the detailed results hit the UW survey center's website.; 3:17 AM