Friday, April 19, 2013

Reinhart Rogoff data analysis

I am analyzing the data set used by Reinhart and Rogoff and by Herndon Ash and Pollin in their critique.
In particular I am analysing the the stata data set RR-processed.dta with data on public debt to GDP ratios and real GDP growth in 20 developed countries since 1946.

I will always use all of the available data and weight country*year observations equally (that is use the HAP approach).

First I note (as many including R-R have) that the original work shows no particular evidence of a critical level of debt around 90% of GDP.  the level 90% was chosen by R and R.  It is not at all possible to see if the underlying association is non linear by looking at the average growth rate in an open ended category.  In particular the average debt/gdp ratio which is over 90%  could be say 900% for all the reader of the table knows.  This should be obvious but I present two tables. the first corresponds to the Herndon Ash and Pollin (2013) PERI wp 322 results

While much less dramatic than the R-R table, it does seem to show a nonlinearity as one goes from category 3 (60%
However, this is largely due to the fact that the increase in the category average debt to GDP ratio is very large as is shown by the following table

The increase in the subsample average debt ratio from category three to four is over 47.9%, the increase from category 1 to category 3 is under 53.6% -- the difference in debt ratios is about the same going from category one to three and from three to four.  Similarly the differences in growth rates over 0.98 to under 1.02 are similar.  There is no hint of non linearity.  The cutoff was imposed by R and R and does not reflect a  result.  Of course this has been shown much better and much more thoroughly by HAP and then their colleague  Arindrajit Dube using non parametric regressions.

Ah yes Dube's main point is that the data suggest that causation runs from low growth to a high debt to GDP ratio and not from high debt to low growth.   I have a little bit to add to that (but mainly click the link).

Some more nearly interesting results relate to the direction of causation.  There is a standard approach to testing if correlation is causation from x to y.  It is called a test of the weak exogeneity of x.  Really it just means that the OLS estimate is similar to an instrumental variables estimate.  I will use the 5 year lagged debt/GDP estimate as an instrument.  The idea is that if high debt causes slow growth, then high debt which is predictable in advance based on the lagged ratio should have the same association with growth as debt surprisingly accumulated in the past 5 years.  This is a lot of talk to explain a very simple regression with the 5 year lagged ratio included.

Here are some results

The simple regression showing a tiny negative effect of debt/GDP on the subsequent years real GDP growth

This says that increasing the ratio by 10% of GDP is associated with a decline in the rate of growth of 0.18%  per year.  Basically there was never any need to go on.  This is a tiny effect completely dwarfed by Keynesian stimulus effects for modest multipliers. This has long been noted.

But there is great interest in the data set anyway.

Next I have to exclude observations for which I don't have the lagged ratio, so the next regression

. reg dRGDP debtgdp if l5debtgdp!=.

says only use the data if the 5 year lagged debt to gdp ratio (lfdebtgdp) is not missing.  This matters almost not at all.

OK now finally the regression of interest

 reg dRGDP debtgdp l5debtgdp if l5debtgdp!=. 

This is a regression of one years real GDP growth on the debt to GDP ratio and the 5 year lagged debt to GDP ratio ( l5debtgdp)

OK that's about it.   If one trusts the standard error calculation, one concludes that there is very very strong evidence that, given debt now, it is much better to have been highly indebted already 5 years ago.  This is the pattern one would expect if low growth caused a high debt to GDP ratio.  Future growth is low if debt is higher than one would predict given debt 5 years ago -- presumably because that is the result of disappointing growth and growth rates are serially correlated.  

Old debt is not so damaging.  This means that it comes out looking as if old debt is positively a good thing (really the regression doesn't show this it shows if you have debt it is better for it to be old).

OK that's just the first regression of interest.  It is a barbaric pooled regression in which I am assuming that, aside from debt, countries are all alike and also that, again aside from debt, years are all alike.  The fact that quite a few countries had poor growth in say 1975 is a complete mystery to the poor computer.  

The next regression includes a complete set of country dummies (so it is a country fixed effects regression)

. reg dRGDP debtgdp l5debtgdp count* if l5debtgdp!=.

Well that's pointless.  Let me zoom in on the coefficients of interest

Oh the coefficient on the lagged ratio is actually a bit larger. There is no hint the coefficient in the first regression of interest was due to omitting country effects.

Now I add a complete set of year dummies (many of which are dropped because I don't have lagged debt for that year and one of which is dropped because they all add up to the constant term).

(note I can't capture all of the boring coefficients)

Again zooming in

The coefficient is about three fourths as large as the original barbaric pooled estimate.  Yes some of the bad growth causes a high debt to GDP ratio effect is picked up by year dummies.  But the remaining country specific part of the reverse causation effect remains about as large as the original growth debt correlation.

I am not really saying that the null that R-R correctly interpreted their regression is rejected at the 0.001% level.  You shouldn't believe the standard errors. Stata calculated them under the assumption that the disturbances to growth rates are Not serially correlated and the motivation for the regression is that they are.

The simple (for the simpleminded STATA user) way to deal with this is to recalculate the standard errors allowing unrestricted correlation of errors for the same country.  This is a new calculation of the standard errors for the exact same coefficient estimates.

. reg dRGDP debtgdp l5debtgdp  if l5debtgdp!=.,cluster(cntry)

gives a new T-statistic on l5debtdgp of 3.19 markedly smaller than the uncorrected t-stat of 5.11 but still very significant.

including country dummies and calculating serial correlation robust standard errors

 reg dRGDP debtgdp l5debtgdp count*  if l5debtgdp!=.,cluster(cntry)

gives a t-stat of 3.76 again much smaller than the uncorrected standard error of 6.35 but still very significant.

I'd love to stop here (of course) but including country effects, year effects and allowing for serially correlated disturbances over time for the same country is just too much for the poor t-statistic which falls to 1.14.

I don't worry much about this.  R-R don't include year dummies or calculate standard errors at all, so I don't think I have to when critiquing them (also I started writing before estimating that regression).

OK I can't resist one more regression. Since the validity of OLS on the current debt to gdp ratio is highly suspect, how about a regression on the 5 year lagged ratio ?

The coefficient is smaller (no surprise this isn't evidence of reverse causation bias at all).  the t-statistic is -3.40 under the assumption of serially uncorrelated disturbances.  The t-statistic becomes -1.44, that is statistically insignificantly different from zero, when standard errors are calculated allowing serial correlation within a country

finally here is the do file which I used to generate the results from the HAP version of the RR data set

use C:\rjw\Papers\Peri\RR-processed.dta

quietly tab Country,gen(count)
gen cntry = count1+2*count2+3*count3+4*count4+5*count5+6*count6 + 7*count7 + 8*count8+9*count9+10*count10+11*count11+12*count12+13*count13+14*count14+15*count15+16*count16+17*count17+18*count18+19*count19+20*count20
quietly tab Year, gen(yr)
sort cntry Year

gen debtcat = 1 + floor(debtgdp/30)
replace debtcat = 4 if debtcat>4
gen episode = 1 in 1/1
replace episode = episode[_n-1]+1-(debtcat==debtcat[_n-1])*(cntry==cntry[_n-1]) in 2/1175

gen debtct1 = debtgdp<30 div="">
gen debtct2 = (debtgdp<60 debtgdp="">=30)
gen debtct3 = (debtgdp<90 debtgdp="">=60)
gen debtct4 = debtgdp>90

gen debtm90 = (debtgdp-90)*(debtgdp>90)

gen l5debtgdp = debtgdp[_n-5] if cntry==cntry[_n-5]

tab debtcat,sum(dRGDP)
tab debtcat,sum(debtgdp)

reg dRGDP debtgdp debtm90

reg dRGDP debtgdp
reg dRGDP debtgdp if l5debtgdp!=.

reg dRGDP debtgdp l5debtgdp if l5debtgdp!=.
reg dRGDP debtgdp l5debtgdp count* if l5debtgdp!=.
reg dRGDP debtgdp l5debtgdp count* yr* if l5debtgdp!=.

reg dRGDP debtgdp l5debtgdp if l5debtgdp!=., cluster(cntry)
reg dRGDP debtgdp l5debtgdp count* if l5debtgdp!=., cluster(cntry)
reg dRGDP debtgdp l5debtgdp count* yr* if l5debtgdp!=., cluster(cntry)

reg dRGDP l5debtgdp if l5debtgdp!=.

reg dRGDP l5debtgdp if l5debtgdp!=.,cluster(cntry)


Fiura said...

Hi Robert, there are plenty of posts on RR out there, but yours is the best I've read so far.

Hans Suter said...

I have no intellectual access to your beautifully colored figures but the fact that the 90% rule was fetched out of thin air reminds of another economist's work: the 3% deficit rule