Thursday, July 03, 2008

Death to "Statistical Dead Heat"

People object to CNN calling the following poll a statistical dead heat. They are right as the phrase has no useful purpose. The results of the poll, by themselves, do not enable us to reject the null that the race is actually tied at the 95% confidence level (there is nothing particularly special about the number 0.95)

July 01, 2008

POLL: CNN National

CNN/Opinion Research Corporation
6/26-29/08; 906 RV, 3.5%

Obama 50, McCain 45
Obama 46, McCain 43, Nader 6, Barr 3

CNN polled 906 people of whom 45 or4 46 didn't say Obama or McCain. The correlation between supporting Obama and supporting McCain is approximately -0.90.

The maximum likelihood estimate of the variance of the fraction support Obama
is 0.5*0.5/906 = roughly 0.000275 giving a standard error of roughly 0.0166 = 1.66% so 2 standard errors = 3.32%

The maximum likelihood estimate of the variance of fraction who support McCain is about 0.000272 giving a standard error of the faction who support McCain is about 0.0165 so 2 standard errors = 3.30% (why don't they ever admit that the variance of a binomial depends on the probability ?).

The variance in the fraction that supports Obama - the fraction that supports mecain
is not the sum of the variance given above because -0.9 < 0. it is
variance Obama + 1.8(se Obama)(seMcCain) + variance McCain = about 0.00104 for a standard error of the difference of about 0.0323 so two standard errors of the difference Obama-McCain would be more than 6.4% and 6.4%>5%.

The +/- bands given by pollsters are valid for testing the null that a candidate has support equal to 50%. They are not valid for testing the null that two candidates have the same number of supporters in the population.

Now all of this is totally irrelevant for any practical use of polls. I agree that the phrase "statistical dead heat" has no place in rational discourse. You can always make a poll such that the result is a statistical dead heat. Just ask one person one Obama supporter does not enable us to reject the null that McCain has more supporters in the population.

More importantly, 2 polls which are "statistical ties" does not imply that we can't reject the null that McCain has more supporters in the population. The variance of the average of two independent measurements is only half of the average of the variances. Since many polls of Obama vs McCain are conducted and sensible people average them, the standard error due to random sampling of average is tiny. Of course this doesn't mean that the outcome is determined as people change their minds and there are many other sources of error in estimates of current opinion. It's just that sampling error is a relatively minor problem (so long the sampling is really random). It gets absurd over emphasis, because it is easy to calculate and, besides, journalists have no clue as to what use can be made of a standard error.

1 comment:

BreadBox said...

I think that you are making the following mistake: you are assuming that most journalists have a clue about variance. They *think* that they have a clue about "margin of error": in reality they don't. This is true of almost all journalists, on all sides of public issues.
There are exceptions, of course: Chuck Todd, for example, probably has a clue, and the same may be true for one person at every other new network. But I seriously doubt that Wolf Blitzer understands these issues....

Nice piece.