Site Meter

Wednesday, October 05, 2016

Benchmarks, Models, and Hypotheses

I have been wondering about the frequent use and alarming rhetorical power of the word "benchmark". It often appears in the phrase "benchmark model," which is inconvenient, because I want to contrast benchmarks and models and don't want to write about the difference between benchmark models and other models.

Here I use "hypothesis" to refer to a collection of statements which we think might be true, such that we are eager to find out if they are all true, "model" for a collection of statements which we know are false but which might be a useful approximation to the truth, and benchmark for a model, which we wish to use only by contrasting it with models which we think might be useful approximations to the truth.

I imagine hopes followed by disappointments in the following order.

1) (compound complex) statement P might be true and P implies Q which we can observe.

2) Q is false so P isn't true, but P might still be a useful approximation to the truth because other implications of P are approximately true.

3) All the attempts to use P to approximate reality have failed, because each implication is far from the truth. P has been modified every time we try to use it, so the implication (which would be useful if correct but which is incorrect) is eliminated. We can fit and observed pattern after observing but continually fail to predict anything correctly. Work starting with P shares the fault of totally undisciplined empiricism which can describe but not forecast.

4) However, P is a useful benchmark. We can understand each of the stylized facts by remembering why each proves P false by noting how P had to be modified to fit the fact.

I think macroeconomics is reaching the 4th stage. The DSGE models which have dominated academic work for decades are based on assumptions which ( it is now asserted) were always assumed to be false. They are not especially useful for forecasting (and it is now asserted that they were never meant to be used to forecast). They offer limited guidance for policy in a crisis, because the crisis occurs exactly when one of the standard assumptions failed. However, they are still used as benchmarks. New models are presented as modifications of a standard model. One modification is made per article. Insights are obtained, because the modified assumption must cause the difference in results between the benchmark model and the new model.

My view is that the claim that a something is a useful benchmark might be false.

In fact, I think it is similar to the claim that a model is a useful approximation to reality. A model is a useful approximation if it gives approximately accurate conditional forecasts. It is used by calculating what the outcomes caused by different policies would be if the model were the truth. It is a useful approximately if the conditional predictions of outcomes conditional on policies are approximately accurate. The useful model is used to understand approximately how things would be different if different policies were implemented. Similarly a benchmark model is used to understand how things would be different if different assumptions were true. So we determine the effect of, say, some financial friction by comparing a new DSGE model with the financial friction to the standard DSGE model without it. Again the effort is to see how changing something changes outcomes. The difference might be that policy makers can't really eliminate the financial friction, so the actual outcome is compared to something which can be imagined but not achieved. However, the claims are roughly equally strong. Blanchard discusses considering a policy and considering a distortion as if they were the same sort of considering. "They can be useful upstream, before DSGE modeling, as a first cut to think about the effects of a particular distortion or a particular policy".

I think the choice of a benchmark is important because one modification is considered at a time. If implications were a linear function of assumptions, then it wouldn't matter from to which model one made a change. But, they aren't. The way in which an unrealistic DSGE model differs from the same model with a financial friction can be completely different from the way in which the real world would be different if a financial friction were eliminated.

But I think it there is a more important problem with accepting a DSGE model as at least a useful benchmark. The result has been that the vast majority of models in the literature share many of the implications of the benchmark model. So, for example, if the benchmark model has Ricardian equivalence, so do most of the modified models. The result is that if one surveys the literature and attempts to see what it seems to imply about the effects of the timing of lump sum taxes, it sure seems to imply there are probably no such effects. Most models imply no effect. The possibility of improving outcomes with temporary lump sum tax cuts is not discussed. When such cuts were proposed in the USA in 2009 (as part of the ARRA stimulus bill) many economists argued that policy makers were ignoring the results of decades of academic research. In fact, they were ignoring the implication of the standard benchmark model which was used, just as a benchmark, in spite of its poor performance.

This is the same pattern seen following the stronger hope that a model might be a useful approximation. The model is introduced with the expressed hope that it might be a useful approximation. Implications are derived. They turn out to be false. It is noted that models are false by definition, and that other implications of the model might be useful approximations. After years or decades, the model is no longer used by specialists in the field. However, it is still presented to outsiders as a useful first order approximation when it isn't. In this context "first order" means "according to the first model developed by my school of thought".

In both cases, actual practical implications are derived through a process which is completely invulnerable to evidence.

I am writing this, because the more diplomatic critics of mainstream academic macroeconomics insist that the models, which they find unsatisfactory, are useful benchmarks.

an example from a not so diplomatic critic . I think this claim is made without any consideration of the possibility that it might be false, and, indeed, a damaging falsehood. It is the least one can say if one isn't willing to tell people that they have wasted decades of their working life. But that doesn't mean that it isn't more than one should say.

A simple example illustrating the danger of changing one assumption at a time. The model is just the original Lucas supply function. The idea is that output is chosen by suppliers who don't observe the price level, so it is equal to the actual price level minus the rational forecast of the price level. This implies that output is a white noise and the location of the distribution of output doesn't depend on the behavior of the price level and therefore doesn't depend on monetary policy. With a standard assumption (or approximation) it implies that the expected value of output conditional on data available to agents is a constant which doesn't depend on monetary policy. This is the policy ineffectiveness proposition which lead Sargent and Wallace to note that, in their model, the optimal policy was to set the inflation rate to some desired target and ignore everything else. Notably this is the policy mandate of the European Central Bank. There are two counter arguments, neither of which amounts to much. The first, is that agents in the model are assumed to have rational expectations and so automatically know the policy rule. It is much more reasonable to assume that agents are boundedly rational and learn the policy rule. It was correctly argued that, given the other assumptions, this learning will have only temporary effects and that the rational expectations assumption will become true in the long run. It was later argued (based on massive evidence) that the current unemployment rate affects the future non accelerating inflation rate of unemployment, that is, that cyclical unemployment becomes structural, that is there ther is hysteresis. In this case, supply depends not only on price level prediction errors but also on the time varying natural rate. It was correctly argued that, in this model, the optimal policy was to target inflation -- the expected level of output didn't depend on policy. Here, in passing, it is worth noting that the additional assumptions mentioned above which were required to get from "location" to "expected value" become critical [Cite Pelloni et al].

But consider a newly installed monetary authority setting policy for an economy populated by boundedly rational agents who have to learn the policy rule. The authority should think what would happen she were less of an inflation hawk that people expect (not with rational expectations but with the actual beliefs of the boundedly rational agents in the economy). The result would be temporarily higher output while agents learn. This would cause permanently higher output because of hysteresis. Alone each of boundedly rational learning and hysteresis do not change the optimal policy. Together they change everything. The rule that only one change in the benchmark model is considered at a time can prevent people from seeing this. In fact, I think it has prevented most macroeconomists from seeing this.

OK Amateur partisan intellectual history after the jump.

A testable hypothesis always includes the core hypothesis of interest and auxiliary hypotheses required to obtain testable predictions (so Newton's model of the solar system includes the core hypotheses of his law of gravity and laws of motion and the auxiliary hypotheses that the sun and planets are rigid spheres and that the effects of all forces but gravity are negligeable). The problem is that the so called core hypotheses of the PIH, REH and EMH are not such thing. They are, in fact, always the same non-hypothesis that, ex poste one can find some utility fuction such that the actions of agents are consistent with rational maximization of the expected value of that utility functoin. This is true, because it must be true. It is agreed (and easily demonstrated) that the assumption that agents maximize something has no implications at all without some further assumptions about what they maximize. The core hypothesis is not falsifiable. If rejection due to failure of auxiliary hypotheses is not considered a reason to abandon the research program, then the research program is completely invulnerable to evidence.

This is a deadly problem, but I want to write about a different less important problem.

I am very irritated by the phrase "all models are false by definition". It mocks model testers who have demonstrated that some model has false implications. The implication is that the model testers misunderstood the aim of the model developers, incorrectly perceiving a model to be a hypothesis. Foolish salt water economists decided for some silly reason that the permanent income hypothesis, the rational expectations hypothesis and the efficient markets hypothesis were hypotheses. I claim that this shows bad faith. A statement is a hypothesis (with the associated scientific dignity) until it is proven false, then it turns out that it was always a model and the people who proved the statement false are silly.

The repeated use of the word "hypothesis" in the 50s 60s and 70s strongly suggests that the equations in question were not originally considered parts of models which were false by definition. Thomas Sargent's phrase "take a model seriously" sure seems to imply "treat a model as a null hypothesis." And, in fact Sargent once said (original pdf download here) that Lucas and Prescott were enthusiastic about hypothesis testing until he falsified too many of their hypotheses, and both independently said exactly that.

My recollection is that Bob Lucas and Ed Prescott were initially very enthusiastic about rational expetations econometrics. After all, it simply involved imposing on ourselves the same high standards we had criticized the Keynesians for failing to live up to. But after about five years of doing likelihood ratio tests on rational expectations models, I recall Bob Lucas and Ed Prescott both telling me that those tests were rejecting too many good models. The idea of calibration is to ignore some of the probabilistic implications of your model but to retain others.

7 comments:

Philip said...

"My view is that either the claim that a something is a useful benchmark might be false." Is part of the sentence missing?

You say: "But I think there is a more important problem with accepting a DSGE model as at least a useful benchmark. The result has been that the vast majority of models in the literature share many of the implications of the benchmark model."

In fact, you must go farther back than the original DSGE models to the General Equilibrium model. The current impression is that whatever the problems with DSGE, General Equilibrium itself is fine.

I show in the link below that General Equilibrium theory is no different from Marshallian analysis. The two are really mathematically equivalent, which is why both, for example, conclude that there is no such thing as involuntary unemployment.

The mathematical equivalence of Marshallian analysis and General Equilibrium theory

reason said...

Philip - YES finally someone who points the finger in the sore point.

I must say, that I was really, really annoyed by Oliver Blanchard saying:
https://piie.com/blogs/realtime-economic-issues-watch/further-thoughts-dsge-models
"I believe that there is wide agreement on the following three propositions; let us not discuss them further, and move on:
1.Macroeconomics is about general equilibrium."

My comment (at Mark Thoma's place) http://economistsview.typepad.com/economistsview/2016/10/links-for-10-04-16.html#comment-6a00d83451b33869e201b8d22580b4970c:
" But I am not so much against "equilibrium", but against the concept of "general equilibrium" because I think the system is an evolutionary system and never even approaches a "global equilibrium" (i.e the system changes faster than it converges).

Also it is well known what the conditions for a general equilibrium are, and equally well known that they can never be fulfilled.

My view is that what Oliver Blanchard is effectively saying - macro-economics is about flying unicorns. Any models that don't include flying unicorns are not macro-economic models. But I would then ask him, what is the most famous book in all of macro-economics? Surely, it is J.M. Keynes "The General Theory of Employment, Interest and Money". No mention here of General Equilibrium. (In fact the General in the title was to distinguish his theory from the classical theory that he thought was a special case of his more general theory.) Is this book in fact not about Macro-economics then?"


reason said...

Maybe flying unicorns is not the correct analogy, maybe it is better to talk about cosmology and dark matter. Something, perhaps mythological that has a theoretical existence but for which there is no evidence. Surely, there can be cosmology without dark matter.

reason said...

I think what he might have wanted to say with that statement, is that macro-economics considers the interaction of all markets, that ceteris paribus assumptions are not allowed. But what he actually said is something quite different and much more restrictive, perhaps out of habit of thinking that way.

reason said...

In my own view all useful macro-economics is disequilibrium macro-economics. If by lucky chance we had general equilibrium then the model is redundant except as an academic exercise.

Robert said...

Dear Philip

In fact there was an excess word (not something missing). The word "either" shouldn't have been there. I have deleted it.

I don't know what I had been planning to write. Either I was just being stupid (ooops I did it again).

Thanks for pointing out my braino.

reason said...

Robert,
Philip beat me to it - with General Equilibrium as well. I was just about to point it out when I saw Philip had already done it.
But it just goes to show you, you are being read carefully. What you say is usually worth reading. Thanks for linking to Mean Squared Errors the other day.