Robert's Stochastic thoughts

Saturday, May 21, 2005

Rejecting the Alternative II

Elementary Statistics on the Front Page of the New York Times again.

Rejecting the alternative hypothesis is an elementary error in statistics. In the Neyman Person framework (hypothesis testing) the null hypothesis can be rejected against an alternative. A failure to reject the null is not evidence against the alternative. I have mentioned this error before here.

Todays case is very blatant.

Benedict Carey writes

But in the only rigorously controlled trial so far in depressed patients, the stimulator was no more effective than surgery in which it was implanted but not turned on.

and

In the study, doctors implanted the device in 235 severely depressed people. The stimulator sends timed pulses of electricity to the vagus nerve, which has wide connections throughout the brain.

Half of the patients then had their stimulators turned on. The investigators did not know which of their patients had their stimulators on.

After three months, researchers "unblinded" the study and compared levels of depression in the two groups based on standard measures of disease severity, the F.D.A. documents show. They found that 17 of the 111 patients who had implants turned on and completed the trial showed significant improvement. But 11 of 110 who had no stimulation and completed the trial also felt significantly better. The difference between the two groups was small enough to be attributable to chance.

It appears that Mr Carey is unaware of the subtle mathematical point that 17/111 is greater than 11/110. The arithmetic error is made worse by the fact that the false claim that the two proportions are the same is made before the jump and the actual numbers which are not the same only appears after the jump.

Now the difference does not reject the null that the two rates are the same, that is, that the treatment is ineffective. The probability that 17 or more of 28 positive responses are in the treatment group is (roughly using a normal approximation to the binomial) 13 %. To be careful people tend to use a two tailed test, that is, ask what is the chance that 17 or more of postive responses are in the treated group plus the chance that 17 or more are in the control group. That would reject the null at the (very roughly) 26% level. This is far above conventional significance levels.

It is, indeed, very strange that the FDA is considering approval of a treatment supported by such weak evidence. Like various experts quoted in the article, I would have expected that FDA advisory panel to tell Cyberonics Inc., the Houston company that makes the stimulator that the device would only be approved after they performed a larger study and then only if the pooled results were significant. If the point estimate of the benefit were exact (roughly a 50 50 chance) the study would need to be quadrupled, so the new sample would have to be about 330 patients half treated and half controls. If the device happened to be almost exactly as effective as (weakly) suggested by current data, this would have a 50 50 chance of resolving the question.
At the cost of $ 15,000 per patient mentioned in the article it would cost about 5 million (rounding up a bit for the cost of keeping FDA quality records). I suspect that Cyberonics claimed that they couldn't afford such a study.

It seems to me that it might be reasonable to give the FDA some money to finance studies of promising but unproven treatments. The current approach of having firms pay all of the cost of testing seems to me to be a false economy, since firms can choose not to release negative data.

Still my basic point stands 17/111 > 10/110, weak evidence in favor of the alternative is not proof that the alternative is false, don't reject the alternative that a treatment is better than nothing unless there is significant evidence that it is worse than nothing. If you can't know, don't write nonsense like 17/111 = 10/110.

3 comments:

Anonymous said...: Dear Robert,

The blog is a joy, but I really did not find this article confusing or poorly written. Quite the opposite :)

Anne; 1:19 AM
Anonymous said...: I repeatedly use the New York Times as a writing model, and this at "our" school :)

Anne; 1:21 AM
Robert said...: I agree that the article, taken as a whole, was clear and interesting.

However, the journalist interpreted insignificant evidence that the treatment worked as evidence that it did not work. Also note the text I quoted. The journalist said that 17/111 is less than or equal to 10/110 which is simply false, not unclear, false.

This post was second in a series about how insignificant evidence that a treament works is treated as evidence tht it does not work. This is a very common practice. It is also an elementary error.

In each case the NYT did not draw a false conclusion, but they did not explain what hypothesis testing is and isn't.

Rejecting the alternative is a major pet peeve of mine. Pointing out this error is one of the things I do for a living.

Thanks for your comments. You are very very kind. In fact, one of the things that convinced me I wanted to blog was a comment which you wrote on Brad's blog about his post about a coyote.

(just checked it was you or at least someone named Anne who is very kind); 4:01 AM