Site Meter

Sunday, April 28, 2019

Barry Barry Ritholtz asks how the New York Times got 2 very different figures from the same data

Barry Ritholtz has doubts about data presentation at the New York Times

His commentary is very brief "What is this about? Is it guilty conscience, or something else?"

The graphs are strikingly different

and

Alberto Cairo explains

I have a guess about what happened. I guess the second figure is a regression with counties weighted by population and the first is unweighted. I consider both semi reasonable things to do, but weighting to be better (it is also the second to be produced if I understand correctly)

First the fact that the size of the circles depends on total votes in the second figure suggests that the regression was weighted by total votes. Second it is clear that the big blue counties pull the line more in the second figure. I note that the estimated effect of government assistance on the Trump vote is greater in the second figure.

Others have another guess -- that the first line was hand drawn after eyeballing and isn't a regression line. That seems unlikely to me. Someone tweeted that it is clearly not an OLS regression. I suspect that the eye is even more influenced by outliers than OLS is. The dense cloud of many fairly similar observations does not impress us as much as it impresses a computer running OLS

I think I'm going to give a hostage to fortune and guess that, if someone sends me the raw data, I can run an unweighted OLS regression and get figure 1. I feel pretty safe, because I am pretty sure few people will read this and none will download and e-mail the data to robert.waldmann@gmail.com

No comments: