Robert's Stochastic thoughts

Saturday, November 08, 2008

Random Yglesias

Polling We Can’t Believe In

I think a post-election Gallup tracking poll is really what we don’t need. If this past election revealed anything, it’s that all too many people don’t grasp the concept of “random variation” in this kind of thing. But suffice it to say that if you do a poll of roughly Gallup’s size and take a three day rolling average, you wind up generating all sorts of neat-looking vaguely sinusoidal curves that people then dream up narratives to “explain.” The whole thing’s a waste of time, and ultimate does more to misinform people than to help them understand the world.

For Yglesias it appears to be a methodological a priori that "neat looking vaguely sinusoidal curves" no matter how large are all due to random variation. IIRC in all of his posts on the topic, he has never tested the null that this or that vaguely sinusoidal curve can be explained by random variation, never calculated a standard error and never made any argument which has anything to do with "Gallup’s size".

If I calculated correctly, the sinusoidal curves which Yglesias thinks people wasted time on were not due to random sampling (that is the null that they are was rejected at the 5% level). They were temporary and thinking up explanations was a waste of time, but that doesn't mean they were due to random sampling.

I find it odd to talk about sample sizes and variation due to random sampling without performing calculations.

update: This is at his usual level of brilliance (which is high) as he finds something interesting to say on the topic of "Why Newt Gingrich shouldn't be President" which is about like finding something interesting to say on "one plus one equals two." No snark here I mean it -- brilliant. On the other hand I thought he didn't like "scare quotes".

update:

IIRC Yglesias argued that convention bounces were just noise. This is very hard to reconcile with data over many elections. Also hard to reconcile with data over many polls.

I claimed that there was a significant Obama's overseas trip bounce (good for Obama). In each case, I think my data mining was relatively minor. I mean I decided to look for a bounce about then in advance. The choice of exact days was ex post and allowances for that can get p-levels over 5%.

OK so what makes me convinced. First the alleged bounces also appear in other polls (more or less).

I would say there are two competing hypotheses.

H1: noise due to random sampling fed through a 3 day filter creates curves to which people give interpretations.

vs

H2. There are short lived shifts in intentions to vote in the population (perhaps even corresponding to the interpretations) added onto the sampling noise.

I think it is easy to test the null H1 against the alternative H2 but I haven't done it. According to H2 numbers in different polls are positively correlated. Also numbers from one poll will correlated even if the moving average windows don't overlap.

I do not think it makes sense to ask people how often they changed their minds or repeatedly interview the same people. Consider an example. Mr X sometimes feels "undecided but maybe leaning towards Obama" and sometimes feels "undecided but maybe maybe maybe leaning towards Obama". I think it is possible that if he is in state 1, he will respond to a poller pushing him for a name by sayinng Obama and in state 2 will insist on undecided (pollsters press people who say they are undecided). This is a true change in voting intentions as measured by pollsters. It has nothing to do with sampling error. It also has little to do with electoral outcomes.

I think that my hypothetical Mr X would have no sense that he changed his mind. However if he is in state 1 at and in state 2 in t plus one week the results of polls including him will be different. Most importantly I would guess that if he is interviewed at t and says Obama then re-interviewed at t + 1 week he will say Obama again with positive probability (note I assume that if not asked at t then he would say undecided at t+1 week). Once people have stated an opinion, they are somewhat reluctant to say they have changed their minds unless they think they have a good reason to change their minds (anchoring has been experimentally demonstrated even if the original statement was in answer to a question of the form "pick a random number between ...").

Now, if he exists, my Mr X creates a problem roughly similar to sampling error. The shifts in voting intention are so tiny that a more exact measure would be 0 votes shift not one (for example maybe the probability that he votes for Obama goes from 51% to 53% which is more like 50% to 50% than like 50% to 100%). This problem is not removed by refraining from pushing undecided voters or by counting only voters who say they are sure how they will vote. Ms Y might drift back and forth from "undecided leaning Obama" to "probably Obama" and her responses from first undecided then Obama if pressed to first Obama. Mr Z might use "sure" to mean subjective probability 95% and have his subjective probability that he will vote for Obama drift from 94% to 96%. Anywhere you draw the line, there will be people close to it who drift across for no important reason and with almost exactly 0 change in the probability distribution of their actual vote on election day.

Now, as a practical matter, I entirely agree with Yglesias, that the effort to explain the movements of polls is wasted (at best). A shift from "undecided maybe leaning towards Obama" to and from "undecided maybe maybe maybe leaning towards Obama" tells us almost nothing about the expected outcome and so is almost exactly as informative as shifts due to sampling error. I mean if it is August the difference between short lived and non-existent is immaterial.

Saturday, November 08, 2008

No comments: