Monday, March 31, 2008

That Hideous Strength

Back to Kirsch et al. A study which claimed that antidepressants provide clinically significant benefits only for the severely depressed has received a lot of attention
33 google news hits and lots of plain google hits.

Much commentary completely missed the "clinically significant" and, in fact claimed that Kirsh et al had shown that patients who received a placebo did "just as well" as patients who received antidepressants. This was a null hypothesis overwhelmingly rejected by Kirsch et al.

The huge amount of attention received by the paper is, I think, entirely due to the appeal to the concept of "clinical significance" as authoritatively defined by the National Institute for Health and Clinical Excellence (NICE) in the UK. Hmm where have I read that acronym before ?

NICE declared that to be clinically significant a benefit had to be 3 points on the Hamilton Rating Scale of Depression or at least one half of one standard deviation of the changes in the HRSD in the treated subsample.

The first definition is arbitrary and, I think, nonsensical (if the only change were from "I think life is not worth living" to "I think life is worth living" that would be one point on the HRSD). However the second definition is much much more absurd.

The standard deviation of changes is very important for testing whether an apparent benefit could be due to chance. It is useful for constructing a confidence interval around estimated benefits. It is not useful for determining clinical significance.

I think an example should be sufficient to prove this. Assume there are two huge controlled trials of drug A and Drug B. Each has a subsample of patients given a placebo. These people show improved depression with an average improvement of HRSD of 5 (in both trials). The standard deviation of changes in HRSD is 5 in both Placebo subsamples. Change over standard deviation is 5/5.

Patients in the trial of drug A who received drug A have an average improvement of 8 with a standard deviation of 5. change/standard deviation is 8/5 which is 0.6 higher than 5/5 so it is concluded that drug A provides a clinically significant benefit.

For each patient who receives drug A there is a patient who receives drug B (convenient coincidence). For 90 % of patients who receive drug A there is a patient who received drug B and who had exactly the same change in HRSD. For 10% of patients who got drug A (and who had the same average benefit as the 90%) there is a patient who received drug B whose improvement was greater by 20 HRSD points.

The mean change with drug B is 8 + 0.1*20 = 10.
Given the assumption in parentheses, the variance of changes for patients who received drug B is 25 + (400)(0.1)(0.9) = 61 so the standard deviation is 7.81
change/standard deviation is 1.28. 1.28-1 = 0.28 < 0.5 so NICE declares that drug B does Not have a clinically significant benefit.

But wait a minute, experience with drug B shows first order stochastic dominance over improvement with drug A. The problem with drug B is that a few patients had a wonderful experience. This added proportionally more to the standard deviation than to the mean (has to do with square root of 0.1 is much bigger than 0.1).

Using the mean divided by the sub sample standard deviation to assess clinical significance is utterly idiotic.

1 comment:

pj said...

Normally you'd only engage in this kind of standardisation if you were trying to combine results from different instruments - to do it with results all from the same measure really requires rather more justification than Kirsch et al supply because, as you point out, the potential downsides are rather obvious.

The ordinal nature of instruments such as the HRSD is a great limitation, as you say with reference to how significant even small changes can be (or how insignificant large changes can be), but they are an imperfect tool that is at least less opaque than Kirsch et al's 'd', which means nothing in particular. Another approach, indeed probably more common in medical meta-analysis, is to have predefined criteria of 'improved' and then to look at relative risks.

It is worth pointing out that NICE are a lot more circumspect with their 'clinical significance' cut-off than Kirch et al (with Kirsch, as I've noted before, pointing out the arbitrary and unvalidated nature of NICE's criteria) particularly where the confidence intervals cross the 'clinical significance' cut-off even though the point estimate doesn't.