Robert's Stochastic thoughts

Friday, October 14, 2005

Standard Errors in Extreme cases

According to the latest NBC/Wall Street Journal Poll 2 % of African Americans approve of the job George W Bush is doing. That's impressively close to unanimous (though I think that Abbas back when he was prime minister of the Palistinian National Authority managed to bottom it).

Amusingly Judd at Think Progress writes "To be fair, the margin of error on the poll is 3.4%. So Bush’s actual approval among African Americans could be anywhere from -1.4% to 5.4%." I assume he is joking. Of course there are no negative African Americans (not in the sense of nattering negroes of negativism but less than zero people) so no matter what Bush has done it is not possible that the true population fraction of African Americans who approve of him is -1.4%.

So what is the standard error really and why is the usual calculation of plus or minus two standard errors wrong in this case ? First of all, the standard error of a poll depends on the true fraction of people in the population who answer yes. The convention of pollsters is to calculate the standard error assuming that the true population is half and half. This gives the largest standard error. If the true fraction who approve is p then the standard error of the percent in the sample who approve is p times 1 minus p divided by the square root of the sample size times 100%

(1-p)pN^(-0.5)100%. A consistent estimate of this standard error can be obtained by using the fraction responding yes in the sample in place of p. In this case the consistent estimate of the standard error is about 0.5 % not 1.75%. Clearly pollsters use the higher number because they don't want to frighten people saying standard errors for percentages for different questions are different so they give the upper bound for all questions.

now the plus or minus two standard erros gives a 95% confidence interval only works if the statistic has a normal distribution around its expected value. This is an excellent approximation for proportions of yes answers if the population fraction of yes answerers is near 50% and the sample size isn't tiny (30 observations are plenty if the true fraction is 50%). Of course it can't be that the percentage of respondents answering yes in a sample is normally distributed, since that percentage can't be negative or greater than 100%. The normal approximation is no good for a population percentage of yes answerers areound 2 % unless the sample size is huge. Clearly the NBS/WSG sample is no where big enough for the percent of African Americans in the sample who approve of Bush to be normal distributed around the percent of all African Americans who approve of Bush.

Update: the Poll included 89 African Americans. 2% would be, in plain English 2, that is two of the African Americans who were polled said they approved of the job Bush was doing. Assuming random sampling, it is possible to calculate the exact probability that two people in the poll would so answer as a function of the fraction (p) of all African Americans who would say they approve. This probability is

(1-p)^87*p^2*89*44 where * means times and ^ means superscript so to the power of.
This means that it is possible to reject the null that true population approval is 7% or above at the 97.5% level using a one tailed test and that it is possible to reject the null that true approval is 0.3 % or below at the 97.5% level using a one tailed test. This gives a 95% interval of a sort from 0.3% to 7 % (not a confidence interval in the standard sense of the word but something which is the same as a confidence interval for normally distributed statistics and a more reasonable interval for inference in this case).

Friday, October 14, 2005

No comments: