Sunday, June 14, 2009

Rule number 1 Nate Silver is always right
Rule number 2 If Nate Silver is wrong consult rule number 1

Andrew Sullivan says the almost perfect correlation of shares of vote by Iranian presidential candidate in different official reports of partial counts of vote are proof of fraud.

Nate Silver notes a similar correlation for the US 2008 presidential.

I am reluctant to challenge Nate Silver on statistical analysis, but I don't fine his argument convincing.

The USA Iran analogy is odd, because the US is huge, spread over time zones, and (at least on election day) federal. Nationwide in the USA vote counts depend in large part on time zones.

Iran is a large country, but Tehran is a proportionally huge city. Iran is not Yugoslavia, but there are major ethnic differences by region. In the USA there isn't as much gross regional segregation.

So Iranian vote counting is much more centralized and Iran itself has larger regional variation in actual voting (at least it always always did until this election).

I think a reasonable comparison would be with a *state level* election in the USA. At the state level, come on, there is usually 99% correlated in shares of vote totals. It is very often possible to see that the candidate who has more votes counted so far has lost.

Show me a graph that looks like Sullivan't graph with data from a New York State election, and I'll be convinced of the election stealing competence of the Iranian interior ministry. Till then, I will remain convinced that there was not just fraud but unusually blatant fraud.

update: Well the rule, you criticize Silver you look like a fool seems to hold. However, not how I thought. From Silver's comment thread (via Andres Sullivan)

The problem with Nate's analysis of the Iranian election plot is that he split the states effectively randomly (by alphabetical order). The results as they were released in Iran were not released in such a random way. The "waves" that he describes were regional. You would not see that sort of a straight line if you were to split the waves into geographic regions the way that they should. Also, he didn't address your question about the precedent for such a high turnout and such walloping by the incumbent candidate.

Oops I didn't notice the alphabetical business. Really vote totals adding states randomly have nothing to do with vote totals as they are counted.

Used to be in the USA that a Democrat would start out ahead then fall behind as results trickled in from small towns. Now Republicans start ahead and fall behind as results flow fast via the internet and it takes longer to count votes in precincts with a lot of voters. There was a pattern : big cites then small towns. Now there is the opposite pattern: small towns then big cities. Any simulation which is not based on actual partial vote counts as they are announced is irrelevant. It is possible that such a simulation will give results similar to actual partial vote counts. It is possible that totally made up numbers will give results similar to actual partial vote counts. Neither is a valid approach.

I just assumed that Silver must have used actual partial vote totals as reported over time. I am astounded that Silver even ran simulation of what happens over time which is not based on actual events in time.

