I will assume that unemployment is a function of actual inflation minus expected inflation. I will also assume that people are smart enough that no policy will cause them to make forecast errors of the same sign period after period after period.
Friedman's conclusion follows if there is the additional assumption that expected inflation is a constant plus a linear function of lagged inflation. In this case, unless the coefficients sum to one and the constant is zero, it is possible to cause a constant non zero forecast error. It can't be that people are dumb enough to stick with a constant plus a linear function with coefficients that sum to anything but one if that rule is exploited to make surprise inflation always always positive.
However, I know of no one who ever wrote that such a simple model of expectations is the truth. Rather some people including Cagan, Friedman, Tobin, and Solow asserted that something like that is a useful approximation at some times in some places. Many authors expressed belief in a more complicated story in which inflation expecttions are anchored if inflation is low and variable but not anchored if inflation is high and/or steady.
I think such expectations can be modelled either as a result of boundedly rational learning with hypothesis testing or ration Bayesian updating. I will try to do so (famous last words of this post which sensible readers will read).
The monetary authority will, in fact, stick to a simple rule (but agents do *not* know that the rule never changes). It can target inflation and is tempted to trick firms into supplying more than they would under perfect foresight by setting actual inflation higher than expected inflation. I will assume that perfect inflation forecasting causes unemployment to be 5%. This is the non accelerating inflation rate of unemployment. Unemployment is linear in the inflation expectations error so the long term average unemployment is equal to the long term average expectations error.
The simple rule may be stochastic with targets based on coins flipped and dice rolled etc in secret. The monetary authority wants low unemployment and low inflation.
The question is can the long term average unemployment rate be lower than 5%
First bounded rationality with hypothesis testing. The bounded rationality is forecasting with a simple rule which might included parameters estimated by OLS on old data of. In the very simplest rule expected inflation is 2% no matter what. The hypothesis testing part is it is assumed that forecasting rules are ordered from a firwt rule to a second etc. When agents use rule n they also test the null that rule n gives optimal forecasts against the alternative that rule n+1 gives better forecasts. The switch to rule n+1 if the null is rejected a the 5% level (as always this can be any level and as always I choose 5% because everyone does). I will assume that rules are also ordered so if rule n gives persistent underestimates of future inflation, rule n+1 gives higher forecasts.
Forecasting rule 1 is forecast inflation equals 2%. Rule 2 is forecast inflation is equal to a constant estimated by OLS. Rule 3 is forecast inflation is equal to an estimated constant plus an estimated coefficient times lagged inflation. Rule 4 is a regression on two lags of inflation. the series of rules goes on to infinity always adding more and more parameters to be estimated, and includes the actual inflation rule (the monetary policy rule for this silly model).
Friedmans story about accelerating inflation at 3% unemployment works in this model. Rule 2 is flexible enough for his example. If inflation is higher than forecast inflation by a constant, the estimated constant term in the regression grows without bound.
A key necessary assumption is that agents never accumulate more than a finite amount of data about the Monetary authority. A sensible way of putting this is that learning about the Fed Open Market Committee restarts each time a new Fed chairman is appointed. To make things not too easy for myself, I assume that once agents pass from rule 1 to rule 2 they stick with it using all data to estimate parameters. The data used to test the current rule against the next one are only those accumulated with the current chairman. I will assume Chairmen are replaced at known fixed intervals of say 100 periods of time.
Fed open market committe members know all this. They can set inflation so the 2% forecast rule is never rejected against the estimated constant. The optimal strategy will be mixed, that is they will randomize inflation so it isn't too easy to learn what the best estimated constant is. I will assume they set inflation equal to a constant plus a mean zero white noise disturbance term (to be clear the expected value of the random term conditional on lagged information is always zero)
Clearly FOMCs can achieve set inflation to be 2.000001% plus a mean 0 variance 1 constant term without getting caught before the chairman's term expires. This means that the long run average unemployment rate can be less than the NAIRU. This means that there is a long run tradeoff between average unemployment and average inflation.
update: we have a winner so the offer immediately below has expired.
if anyone has read this far, please tell me in comments and I will praise you to the sky in a new post]
Friedman's argument can be true, unemployment can depend only on expectation errors and there can be a long run inflation unemployment tradeoff. There is a big difference between trying to achieve constant unemployment lower than the NAIRU and trying to achieve average unemployment lower than the NAIRU. Friedman also implicitly assumed that the monetary authority never changes and is known to never change.
Basically his implicit assumption is either the Fed can set unemployment to any constant or there is a natural rate. This doesn't follow for many many reasons. I just described one.
OK I talked about Bayesian learning. This post is already way too long. The idea about Bayesian is we start with a prior with a huge mass at inflation is 2% plus a mean zero disturbance term. Then there are positive prior probabilities on a huge variety of other models. However all of the other models have time varying coefficients which follow random walks. This means that the forecast conditional on belief in model N depends on parameters estimated with exponentially weighted lagged data. This means that given the 2.0000001% plus noise rule, the ratio of the likelihood for those models to the likelihood with the 2% plus noise model doesn't growth without bound. This means that the posterior keeps a huge mass on 2% and there is a long run tradeoff between long run average unemployment and long run average inflation.