Sunday, November 04, 2012

Further Recklessness

Yesterday I broke my rule (well not the only one but ... pretty much) and made some forecasts.

Now I am discussing statistics and Meta Analysis with Nate Silver.  Read his post anyway.



This is an unusually excellent post.  

My commment


I want to stick on the very first and simplest issue -- random sampling error.  There is definitely something funny going on in polls with regard to random sampling error -- the results of different polls are more similar than they should be.

You have noticed this and pretty much insinuated that pollsters are deliberately herding -- fiddling their assumptions to get their polls similar to the average poll.  I don't recall the exact words (they were diplomatic) and, of course, no pollster was singled out.

But I think there is a simpler and more innocent explanation -- demographic weighting.  The stated sampling error ignores the use of weights to make the sample of adults match the adult population.  They do this in an effort to remove bias, but it also can reduce sampling error (it can also increase sampling error if a huge weight is put on a subsample of say hispanic women aged 18-25 or something).

For some reason, sampling error is always reported as if the true population frequency of any response is 50%.  This is an old tradition  presumably small c conservative since that gives the maximum sampling error.  I recall a poll of Israeli approval of Ehud Olmert where he had 2% +/- 3% approval.  By the standard calculation it was quite possible that the true number of Israelis who approved of him was negative.  Ooops.  This convention might blind pollsters to the fact that if eg. they make sure that the fraction of their sample of adults which is African American corresponds to the fraction of the adult population, they are eliminating a good bit of sampling error (example chosen as the sampling variance of Obama support from a poll of voting intentions of 100 African Americans would be assumed to be 0.0025 when it is in fact, more like 0.00025).

This is quite important because estimates of random sampling error are a small deal when looking at sampling error in the average of many polls, but they are crucial to estimating the covariance of bias in different states say. A given covariance across states of outcome minus forecast is more alarming if sampling error is smaller, and a given correlation is less alarming if sampling error is smaller.

An analogy (not baseball finance).  Asset prices include noise because assets are not perfectly liquid.  Estimates of the joint probability of default of different MBS were low, because correlations of the probability of default were underestimated because this noise in CDS prices was interpreted as variation in the conditional probability of default.  The result was not pleasant.  Ignoring sampling error when estimating correlation of bias across states would lead to overconfident estimates.  Overestimating sampling error would lead to under-confident estimates (I guess -- and I make this guess about your estimates).

No comments: