Rejecting the Alternative II
Elementary Statistics on the Front Page of the New York Times again.
Rejecting the alternative hypothesis is an elementary error in statistics. In the Neyman Person framework (hypothesis testing) the null hypothesis can be rejected against an alternative. A failure to reject the null is not evidence against the alternative. I have mentioned this error before
here.Todays case is very blatant.
Benedict Carey writes
But in the only rigorously controlled trial so far in depressed patients, the stimulator was no more effective than surgery in which it was implanted but not turned on.
and
In the study, doctors implanted the device in 235 severely depressed people. The stimulator sends timed pulses of electricity to the vagus nerve, which has wide connections throughout the brain.
Half of the patients then had their stimulators turned on. The investigators did not know which of their patients had their stimulators on.
After three months, researchers "unblinded" the study and compared levels of depression in the two groups based on standard measures of disease severity, the F.D.A. documents show. They found that 17 of the 111 patients who had implants turned on and completed the trial showed significant improvement. But 11 of 110 who had no stimulation and completed the trial also felt significantly better. The difference between the two groups was small enough to be attributable to chance.
It appears that Mr Carey is unaware of the subtle mathematical point that 17/111 is greater than 11/110. The arithmetic error is made worse by the fact that the false claim that the two proportions are the same is made before the jump and the actual numbers which are not the same only appears after the jump.
Now the difference does not reject the null that the two rates are the same, that is, that the treatment is ineffective. The probability that 17 or more of 28 positive responses are in the treatment group is (roughly using a normal approximation to the binomial) 13 %. To be careful people tend to use a two tailed test, that is, ask what is the chance that 17 or more of postive responses are in the treated group plus the chance that 17 or more are in the control group. That would reject the null at the (very roughly) 26% level. This is far above conventional significance levels.
It is, indeed, very strange that the FDA is considering approval of a treatment supported by such weak evidence. Like various experts quoted in the article, I would have expected that FDA advisory panel to tell Cyberonics Inc., the Houston company that makes the stimulator that the device would only be approved after they performed a larger study and then only if the pooled results were significant. If the point estimate of the benefit were exact (roughly a 50 50 chance) the study would need to be quadrupled, so the new sample would have to be about 330 patients half treated and half controls. If the device happened to be almost exactly as effective as (weakly) suggested by current data, this would have a 50 50 chance of resolving the question.
At the cost of $ 15,000 per patient mentioned in the article it would cost about 5 million (rounding up a bit for the cost of keeping FDA quality records). I suspect that Cyberonics claimed that they couldn't afford such a study.
It seems to me that it might be reasonable to give the FDA some money to finance studies of promising but unproven treatments. The current approach of having firms pay all of the cost of testing seems to me to be a false economy, since firms can choose not to release negative data.
Still my basic point stands 17/111 > 10/110, weak evidence in favor of the alternative is not proof that the alternative is false, don't reject the alternative that a treatment is better than nothing unless there is significant evidence that it is worse than nothing. If you can't know, don't write nonsense like 17/111 = 10/110.