## Monday, May 13, 2013

### Elementary Statistics

It is almost never wise to look only at the significance levels of statistical tests when attempting to learn anything from data analysis.  The temptation is irresistable, but it is possible to recognise that one has slipped into it.

It has been argued that it is possible to gain a qualitative insight from data analysis by looking only at signficance levels and that further analysis is adding "data caveats".  This demonstrates a failure to understand the work of Neyman and Pearson, that is, complete statistical illiteracy.

The problem is that to attempt to communicate with people who don't understand the concepts of null hypothesis, alternative hypothesis, rejection, size and power I have to appeal to Bayes.   This means I can't resist the other irresistable error of assuming that reasonable people have well defined subjective probability distributions for everything.  This is a demonstrably false hypothesis in psychology (all nulls consisting of this claim and all sets of auxiliary hypotheses anyone can think of so that it is testable have been repeatedly rejected at all standard signficance levels) .  So I wave my hands and say that if you don't have a prior, make  do with something sortof a bit like a prior.

So imagine agent A and an unknown parameter x and assume that, before the arrival of a new study, A has a prior on x  F_A(x).  There is a prior mean  (it would be just the expected value of x if F_A(x) were the probability distribution of x) and a precision of the prior which is just one over the prior variance (which is the variance x would have if F_A were the probability distribution of x).

Then some data arrive.  The data make it possible to calculate the likelihood of x (this is just equal to the probability density of the data as a function of x a likelihood function is just another interpretation of a density function).  Let us assume that the likelihood is normal.    This is a common case if x describes the location of a distribution and the data are used to calculate a sample mean or if x is a probability and the data are used to calculate a frequency.  This means that the likilihood has a mode (the maximum likelihood estimate of x call it xhat) which is equal to its mean and its median.  Interpreted as a probability density, it also has a variance.

So how in what direction the posterior mean and variance of X change from the prior after the data are used to update them ?  There is no simple answer.  It depends on the prior.  One simple case is what if the derivative of F_A (the prior density f_A) is symmetric around the prior mean.  In that case the sign of  the posterior mean minus the prior mean should be the same as the sign of xhat minust the prior mean.

For a symmetric prior, we know that the location of our belief should move towards the point estimate.  This is true for any precision of the estimate (again precision just means one divided by variance that is one divided by standard error squared).

The qualitative change, I mean the sign of the change, depends only on the point estimate.  The magnitude depends on "data caveats" that is the precision of the estimate xhat.

In contrast, a test of the null that x is equal to the prior mean depends only on a z-score the ratio of xhat minus the prior mean times the square root of the precision of the estimate.

The direction of the change in belief has nothing in particular in common with a test of the null that the prior mean is the true value.

Even stronger assumptions about the prior make it possible to say a bit more.  If the prior is itself a normal distribution, then the variance of the posterior must be smaller than the variance of the prior.  The posterior precision is equal to the sum of the prior precision plus the precision of the estimate.  Nothing like this is true of symmetric priors in general.  It is very possible for new information to cause a posterior which is spread more than the prior.

Some possible wrong claims

1) people should think this way, because they should have a prior.  Maybe we should maybe we shouldn't but telling us to have a prior won't make us have a prior.  To have any sense what one learned from new data one has to have some idea what one thought before learning from the new data.  What one thought is almost certainly not an integrable probability measure.

2) Beliefs should move in the direction from the prior mean to the maximum likelihood estimate (provided the likelihood is normal).  Nope the assumption of a symmetric prior is totally utterly not innocent at all.  It is critical.  An example The prior is that there is a 90% probability that x is 0 and a 10% probability that x is 100.  The prior mean is 10.  The likelihood is mean 20 variance 100.  The posterior mean is almost exactly zero.  The likelihood of making those observations if the true mean is 100 is far less than a billionth of a billionth of a billionth.  If the true mean is zero it is greater than 1/1800. The posterior is basically certainty that x is zero.  This is true even though xhat is greater than the prior mean.

3) Well if the prior is symmetric, then new data should give a posterior more concentrated than the prior.
No what if the prior is x = -1 with probability 0.1, zero with probability 0.1 and 1 with probability 0.1.  The variance corresponding to this distribution is 0.2.  Now assume xhat is 1 with variance 0.5ln(8).  The posterior has equal probability that x is 0 and 1 (and a low probability that x is -1).  So the variance corresponding to the posterior is greater than 0.25  > 0.2.  the new information has convinced the Bayesian thinkger that he knows less about x than he thought he knew