Saturday, November 24, 2018

The Jeffreys prior

Over at twitter I learned a lot. I claimed (and claim) that there is no such thing as an uninformative prior. I also claim that the penalty functions multiplied by likelihoods and called priors are not priors. This lead to a debate which was as uninformative as prior debates on the topic. A lot of my obsessions are semantic.

I was also taught about someone called Harold Jeffres who presented something which he called a prior. OK so Wikipedia taught me (twitter being unsuited for explaining things to me (or anyone)). His prior is proportional to the square root of the determinant of the Fisher information matrix. The Fisher information matrix is -1 times the second derivative of the expected log likelihood with respect to a parameter vector theta evaluated at the true theta. It is also the variance covariance matrix of the gradient of the log likelihood at the true theta (the two matrices are identical).

The Fisher information matrix is a function of theta. A penalty which depends on the Fisher information matrix is a function of theta. It can be called a prior (I reserve the term for sincere beliefs).

The point of Jeffreys's prior is that it is invariant under any reparametrization of the model. if phi = g(theta) and g is one to one, then the posterior distribution of phi given Jeffreys's prior on phi will imply exactly the same probabilities of any observable event as the posterior distribution of theta given Jeffrey's prior on theta.

This is true because if theta is distributed according the Jeffres's prior on theta, and phi = g(theta) then phi is distributed according to Jeffreys's prior on phi.

The gradient of the expected log likelihood with respect to theta is the gradient with respect to phi times the Jacobian of g. This means that Jeffrey's prior transforms the way probability densities do and the Jeffreys prior on theta implies the same distribution of phi as the Jeffreys prior on phi.

I am quite sure this is simply because the gradient of the expected log likelihood with respect to theta is a gradient of a scaler valued function of theta. for any scaler valued h(theta) I think the square root of (the gradient of h)(the gradient of h)' would work just as well. For example, if one used the gradient of the likelihood rather than the log likelihood, I think the resulting prior would be invariant as well.

Now except for the expected log likelihood, the Hessian (second derivative) is not equal to minus the product of the gradient and the gradient prime. That implies that for every h() except for the log likelihood there are two invariant priors the square root of the determinant of the expected value of (the gradient times the graident prime) and the square root of the determinant of the expected value of the second derivative.

I think this means that the set of invariant priors is basically about as large as the set of possible probability distributions of theta. Given a prior over theta, invariance implies a prior over any one to one function of theta, but this seems to me to be a statement about how to transform priors when one reparametrizes (which is just the formula for calculating the probability density of a function of a variable with a known probability density).

The log likelihood is a very popular function of parameters and data, but I see no particular reason why a distribution calculated using the log likelihood is more plausible than any other distribution. I don't see any particular appeal of Jeffreys prior. I think one does just as well by choosing a parametrization and assuming a flat distribution for that parametrization.

I don't think I have ever seen Jeffreys prior, that is, I don't think I have ever seen it used.

22 comments:

  1. Thanks for sharing your information!

    ReplyDelete
  2. Thanks for sharing your this great work of yours.

    ReplyDelete
  3. Wow this blog is awesome. Wish to see this much more like this.

    ReplyDelete
  4. Really appreciate this wonderful post that you have here.

    ReplyDelete
  5. I want to read more things about here! thanks for the info.

    ReplyDelete
  6. Many thanks for sharing this one. A must read article!

    ReplyDelete
  7. Thanks for information.keep sharing more articles.

    ReplyDelete
  8. That is a good tip especially to those new to the blogosphere.

    ReplyDelete
  9. Such an amazing and helpful post this is.

    ReplyDelete
  10. Some really useful stuff on here, keep up posting. Cheers.

    ReplyDelete
  11. Thank you and best of luck.

    ReplyDelete
  12. Excellent post. I used to be checking constantly this weblog, so Keep it up!

    ReplyDelete
  13. I am impressed! Extremely helpful info particularly the remaining section :)

    ReplyDelete
  14. I have learn several good stuff here. Thanks for sharing this buddy

    ReplyDelete
  15. You’ve made some decent points there. This is great article, Thankyou!

    ReplyDelete
  16. This is a great inspiring article. Good work you have on this. Keep it up.

    ReplyDelete
  17. You put helpful information. Keep blogging man. Thankyou for sharing

    ReplyDelete
  18. I located the information very useful. You're a great author in this generation, thanks

    ReplyDelete
  19. Hi there it’s me, this website is actually nice and the users are really sharing nice thoughts.

    ReplyDelete
  20. An interesting discussion is definitely worth comment. Write more, All the best!!

    ReplyDelete
  21. Valuable info. Thanks I discovered this awesome website here.

    ReplyDelete
  22. Lot of informative blog are provided here, Happy to read this good post. Thanks a lot

    ReplyDelete