Macroeconomic model comparisons and forecast competitions

by Volker Wieland and Maik Wolters

via Mark Thoma

Selective excerpts

We use two small micro-founded New Keynesian models, two medium-size state-of-the-art New Keynesian business-cycle models – often referred to as DSGE models – and for comparison purposes an earlier-generation New Keynesian model (also with rational expectations and nominal rigidities but less strict microeconomic foundations) and a Bayesian VAR model.

Note that one of the models has nothing to do with micro foundations and three non DSGE models. To jump ahead, I note there is no comparison of the performance of these totally different approaches in the rest of their post.

Given this failure to predict the recession and its length and depth, the widespread criticism of the state of economic forecasting before and during the financial crisis applies to business forecasting experts as well as modern and older macroeconomic models. ... over purely model-based forecasts, were not able to predict the Great Recession either. Thus, there is no reason to single out DSGE models, and favour more traditional Keynesian-style models thatmaystill be more popular among business experts. In particular, Paul Krugman’s proposal to rely on such models for policy analysis in the financial crisis and disregard three decades of economic researchismisplaced.

I think Wieland and Wolters are totally completely utterly unfair to Krugman when they hold him responsible for the forecasts of professional forecasters. Krugman is responsible for Krugman's forecasts. He didn't give a numerical prediction for GDP, but he did predict in fall 2008 that there wouldn't be a quick recovery. I think Krugman outperformed all of the models and all of the professional forecasters. concluding that Krugman was wrong based on their data is bizarre.

Note the gross category error of saying that Krugman's recomendation "

**is**" misplaced (an unqualified statement in the indicative) because of something which professional forecasters "

**may**" do. What is the chance that each professional forecasters does what W and W guess they may do ? If some do and some don't, then the average professional forecast can not be used to evaluate the forecasting performance of more traditional Keynesian approaches.

Finally see below that W and W must concede that professional forecasters do better on average than their models, some of which have nothing to do with DSGE.

The model forecasts are on average less accurate than the mean SPF forecasts (see Wieland and Wolters 2011 for detailed results). ...

Computing the mean forecast of all models we obtain a robust forecast that is close to the accuracy of the forecast from the best model-

Note that they do not explain which model performs best. Since the models are as different as models can be, this is a shocking omission.

Conditioning the model forecasts on the nowcast of professional forecasters (reported in the paper) can further increase the accuracy of model-based forecasts. Overall, model-based forecasts still exhibit somewhat greater errors than expert forecasts, but this difference is surprisingly small considering that the models only take into account few economic variables and incorporate theoretical restrictions that are essential for evaluations of the impact of alternative policies but often considered a hindrance for effective forecasting.

Professional forecasters do not set a very high standard. It is very easy to improve the forecasts of most professional forecasters using no theory and almost no data (see Ehrbeck and Waldmann Quarterly Journal of Economics (1996)

Vol 111 pp 21–40 also note Solferino and Waldmann 2010. "Predicting the signs of forecast errors," Journal of Forecasting vol. 29(5), pages 476-485).

I am shocked that W and W treat DSGE models and Bayesian VARs as part of a uniform class of "models" and argue that the fact that they perform only slightly worse than professional forecasters is evidence that DSGE models are better than old Keynesian models. There is no similarity between Bayesian VARs and DSGE models. I do not believe that this gross conflation is the result of carelessness.

Does any reader of this post believe they would have presented "models" as a homogenous group if DSGE models outperformed the less rigorously micro founded new Keynesian models or if the theory influenced models outperformed Bayesian VARs ?

It is very odd that they consider a narrow set of conditioning variables and a small set of estimated parameters to be an unambiguous handicap. It is well known that richly parametrised models tend to have poor out of sample forecasting performance. The idea of limiting models based on theory was that it would give better forecasts not that, of course, rigor hampers forecasting.

Finally, I think that W and W propose ignoring the past few decades of Macroeconomic empirical research which has shown again and again that theory based macro models can only fit patterns in the data if they are massaged ex post. The pattern they attempt to fit is a simple hump shaped impulse response function. The pathetic failure of the models is shocking -- only to someone who hasn't been paying attention for the past few decades.

There is something else which I type with some reluctance.

"For each forecast we re-estimate all five models using exactly the data as they were available for professional forecasters when they submitted their forecasts to the SPF. Using these historical data vintages is crucial to ensure comparability to historical forecasts by professionals."

Now the legend for figure 1 "Solid black line shows annualised quarterly output growth (real-time data vintage until forecast starting point and revised data afterwards), grey lines show forecasts from the SPF, green line shows mean forecast from the SPF, red lines show model forecasts conditional on the mean nowcast from the SPF."

Click the link and look at figure 1. The numbers for 2008Q2 and 2009Q1 in figure 1 should be "revised."

On July 29, 2011 the BEA released revised estimates for GDP including GDP in 2008 and 2009.

The revisions did not change the timing of the contraction. The overall pattern of quarterly changes during the downturn was similar in both the revised and previously published estimates, though the revised estimates show larger decreases for 2008:Q4 (-8.9 percent compared with -6.8 percent) and for 2009:Q1 (-6.7 percent compared with -4.9 percent). The contributions of specific GDP components to the contraction were similar in both the revised and previously published estimates. (See the briefing on results of the 2011 NIPA annual revision.)

Figure 1 should show revised GDP growth for 2008Q4. It shows a contraction at an annualized rate on the order of 6%. It should show a contraction at an annualized rate of 8.9 percent. The data in the figure do not correspond to the legend -- they are incorrect.

NOw look at figure 2. The contraction rate for 2008Q4 shown in the second panel of Figure 2 should have been available in 2009Q2 as the "nowcast" corresponds to 2009Q2. The number is very similar to those shown in figure 1 and the first panel of figure 2 which should be revised. There was a massive revision of this number made in 2011. Again the figures do not fulfill the promise made in the legend to figure 1.

The data presented in the figures are not the current official estimates of the GDP growth to be forecast. They give no hint of revisions long after the fact. The analysis is incorrect.

## 4 comments:

There's something very fundamental I don't understand about testing the predictive power of DSGE and similar models, which I ought to understand and would be grateful if you could explain. Please try to ignore that as a so-called economist myself I should know this already.

These models start by presuming shock processes that drive everything. These models were inherently incapable of predicting the financial crisis, unless by predict you mean "assume a shock process with the occasional very large negative shock." Then you can say, our models says there's going to be a crisis! Not sure when.

Once the shock has hit, the predictions of these models hinge on how large a shock you assume has hit the system, its persistence, and any subsequent shocks etc. How do the people who "test" these models first decide what sort of shock the system has been hit with, before they can then evaluate the consequent predictions?

oh, I think I figured it out - you just say GDP (or whatever) fell x% from date 1 to date 2, and work out what kind of shock would be needed in the model to replicate that, and then that's your shock.

am I right in thinking there's sort of room for double-the-error here that it must be hard to disentangle? first to the extent that the model is wrong you will impute the wrong magnitude of shock, second to the extent that the model is wrong you will predict the wrong response.

I'm still confused though - because if GDP falls again from date 2 to date 3, more than the model predicts, what stops you from saying aha another shock must have hit (or the shock must have persisted)?

Uh I should know that too. But I don't. I don't do any work with time series. I do the sort of macro theory I denounce. I do simle cross country regressions. I do some regressions with micro data. I analyse data on professional forecasters. I never ever forecast.

I will try to answer your question, but be warned that I don't know what I am talking about.

I agree with your hypothesis about what actual macro econometricians do. It isn't very hard if you only allow as many different kinds of shock ( technology, aggregate demand, money supply etc) as you have time series (GDP, total hours worked, price level, wage level, interest rate). Then you can solve for the shocks.

It is harder if you have many dimensions of shocks as in adding measurement error for each of the varibles. Then the same change in time series can be due to many different combinations of shocks. Fortunately, a probability distribution of shocks is assumed, so you can get a posterior distribution conditional on the data so far. Then to forecast you integrate.

But really really very often the model which confronts the data is a linear approximation to the theoretical model near The steady state. This often means that the model used to forecast is a vector autoregression (just regress current values of the variables on a few lags of that variable and all the other variables). The role of all the fancy theory is often just to get restrictions on these regressions. Forecasting is just the fitted value of the regressions. The role of theory is, say, changes in expected inflation must be innovations (unpredictable) given the rational expectations assumption. Or say the long run effect of monetary policy on output must be zero, so some sum of products of coefficients must add up to zero.

Note that the policy advice is based on the assumption made for convenience that there is only one possible steady state (why I capitalized The) and the untested imposed assumption that the effects of monetary policy on real variables don't last.

The bottom line advice is focus on fighting inflation, because nothing else lasts, as we assumed for convenience and convention.

thanks for response.

You know, I hadn't really thought before about the assumption embedded in these models that policy (monetary, fiscal) has no long-run impact on important real variables like say unemployment or the distribution of real wages. I had previously thought, if I thought about it at all, that was okay because they are supposed to be models of the short-run, used for thinking about how to respond to shocks, not models used for thinking about the long-run and how policy shapes the economy, and haven't thought about what effect policies inspired by these models are having ... I was intrigued by interfluidity's argument that inflation target has held down wage growth ... I would like to see research in that area, maybe it already exists I don't know.

Post a Comment