Tuesday, March 11, 2008

Personal chat with pj

updated

Pyjamas in bananas and I are trying to figure out what Kirsch et al did in Kirsch et al 2008. I have a thought on one of the deepest mysteries -- why are their confidence intervals on d so huge ?

notation: Change is the average change in the Hamilton scale of depression for patients who got an SSRI. Pchange is the average change for those who got a placebo. N is the number of patients who got ssri's (pn the number who got placebo).
d is change divided by the estimated standard deviation of the changes in the study so the t-statistic to test the hypothesis that the true expected change is zero is
d times the square root of n. pd same for placebo. Range is the difference between the upper and lower bounds of the 95% confidence interval for d. Prange same for pd.
rnrange = range times the square root of n, pnrprange prange times squar toot of pn.

some more regressions

reg rnrange change

Source | SS df MS Number of obs = 35
-------------+------------------------------ F( 1, 33) = 39.43
Model | 4.65045538 1 4.65045538 Prob > F = 0.0000
Residual | 3.89223357 33 .117946472 R-squared = 0.5444
-------------+------------------------------ Adj R-squared = 0.5306
Total | 8.54268895 34 .251255557 Root MSE = .34343

------------------------------------------------------------------------------
rnrange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
change | .1834387 .0292136 6.28 0.000 .1240031 .2428743
_cons | 3.590184 .3110234 11.54 0.000 2.957402 4.222966
------------------------------------------------------------------------------


My isn't that a large t-statistic.

Also


. reg pnrprange pchange

Source | SS df MS Number of obs = 35
-------------+------------------------------ F( 1, 33) = 81.05
Model | 3.92402808 1 3.92402808 Prob > F = 0.0000
Residual | 1.59773718 33 .048416278 R-squared = 0.7106
-------------+------------------------------ Adj R-squared = 0.7019
Total | 5.52176525 34 .16240486 Root MSE = .22004

------------------------------------------------------------------------------
pnrprange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pchange | .1676368 .0186208 9.00 0.000 .1297524 .2055211
_cons | 3.652399 .1451204 25.17 0.000 3.357149 3.947648
------------------------------------------------------------------------------


Oh my that is even huger. Oh I guess I should control for the size of the sample as for small samples the t-distribution has tails much fatter than the normal. OK

. reg pnrprange pchange pn

Source | SS df MS Number of obs = 35
-------------+------------------------------ F( 2, 32) = 54.22
Model | 4.2636823 2 2.13184115 Prob > F = 0.0000
Residual | 1.25808295 32 .039315092 R-squared = 0.7722
-------------+------------------------------ Adj R-squared = 0.7579
Total | 5.52176525 34 .16240486 Root MSE = .19828

------------------------------------------------------------------------------
pnrprange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pchange | .1825164 .0175267 10.41 0.000 .1468158 .2182171
pn | -.0035427 .0012053 -2.94 0.006 -.0059978 -.0010876
_cons | 3.726652 .1331891 27.98 0.000 3.455355 3.99795
------------------------------------------------------------------------------


I stress the placebo results, because I know that the sample variance of the change of those who got placebo is not very correlated with pchange. Still pchange has a t-statistic which, to put it mildly is HUGE.

How about weighting by the number who received the placebo in the trial ?

. reg pnrprange pchange pn [pweight = pn]
(sum of wgt is 1.8410e+03)

Linear regression Number of obs = 35
F( 2, 32) = 87.57
Prob > F = 0.0000
R-squared = 0.7761
Root MSE = .17404

------------------------------------------------------------------------------
| Robust
pnrprange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pchange | .1833535 .01469 12.48 0.000 .1534309 .2132761
pn | -.0029935 .0004081 -7.33 0.000 -.0038248 -.0021622
_cons | 3.682386 .1179811 31.21 0.000 3.442067 3.922706
------------------------------------------------------------------------------


OK it is clear that larger average change implies a broader confidence interval of d.

Hmmm why ? If true d were zero, then d times root n would have a t-distribution and the confidence interval of d*n^0.5 would rapidly converge to 2*(1.96) = 3.92.
Clearly the confidence intervals are calculated using the non centered t-like distribution, that is calculated using estimated change not under the null of change = 0.

Hmm d = change/sdhat. as n goes to infinity change goes to the population mean (m) and sdhat goes to sd. Linearizing around m/sd gives
d - m/sd is approximately equal to (change-mean)/sd - m(sdhat-sd)/sd^2

If this is calculated substituting estimates for nuisance parameters it becomes
(change-mean)/sdhat - change(sdhat-sd)/sd^2.

Ohooo that explains it. There is a term proportional to change. Now if I knew the plim of n((sdhat-sd)/sd)^2 I could recalculate the intervals. Odd Gauss tells me it is 0.98 so I guess it is really 1.
gauss program
i = 0;
c=0;
niter = 100000;
do while i < niter;
i = i+1;
a = rndn(10000,1);
c = c+((a'a)^0.5-1)^2/10000;

endo;
c= c/niter;
c;
gauss output
0.98001977


so for large n ((sdhat-sd)/sd)n^0.5 is roughly a unit normal as is ((sdhat-sd)/sdhat)n^0.5. so ((sdhat-sd)/sdhat^2)n^0.5 converges to a normal with variance sd^(-2)

((change-mean)/sdhat)n^0.5 converges to a unit normal so (d-(m/sd) converges to a normal with variance 1 + (m/sd)^2

A very rough sample approximation would be a normal with variance 1+d^2
so this crude confidence interval would be
range = 3.92*(1+d^2)^0.5/n^0.5

OK back to stata



. gen guess = 3.92*(1+d^2)^0.5

. reg rnrange guess

Source | SS df MS Number of obs = 35
-------------+------------------------------ F( 1, 33) = 193.71
Model | 7.29918489 1 7.29918489 Prob > F = 0.0000
Residual | 1.24350406 33 .037681941 R-squared = 0.8544
-------------+------------------------------ Adj R-squared = 0.8500
Total | 8.54268895 34 .251255557 Root MSE = .19412

------------------------------------------------------------------------------
rnrange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
guess | .6250211 .044908 13.92 0.000 .533655 .7163872
_cons | 1.534459 .2874405 5.34 0.000 .9496573 2.119261
------------------------------------------------------------------------------

. reg rnrange guess [pweight = n]
(sum of wgt is 3.2920e+03)

Linear regression Number of obs = 35
F( 1, 33) = 412.96
Prob > F = 0.0000
R-squared = 0.8869
Root MSE = .13623

------------------------------------------------------------------------------
| Robust
rnrange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
guess | .6686736 .0329047 20.32 0.000 .6017285 .7356187
_cons | 1.12118 .2094582 5.35 0.000 .6950337 1.547325
------------------------------------------------------------------------------

. reg rnrange guess if n>100

Source | SS df MS Number of obs = 10
-------------+------------------------------ F( 1, 8) = 198.03
Model | .678477025 1 .678477025 Prob > F = 0.0000
Residual | .027409729 8 .003426216 R-squared = 0.9612
-------------+------------------------------ Adj R-squared = 0.9563
Total | .705886753 9 .078431861 Root MSE = .05853

------------------------------------------------------------------------------
rnrange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
guess | .6539925 .0464743 14.07 0.000 .5468227 .7611624
_cons | 1.159944 .2917518 3.98 0.004 .4871627 1.832725
------------------------------------------------------------------------------

Well that is a large R squared isn't it. I'm using large n because my formula is asymptotic. However. the coefficient is not 1 at all is it ?

It looks as if the confidence interval isn't +/- 1.96 standard deviations. dividing rnrange by guess averaging and multiplying by 1.96 I get that the confidence interval is +/- 1.7 standard deviations of d. In other words it is a 90% interval not a 95% interval.

The fit isn't perfect, because I used a normal approximation and the range is larger than I guess for small samples as it should be. Maybe they did something like a t distribution with n-1 degrees of freedom plus a normal with variance d^2 or something.

Also the difference between n^2* the size of the interval and my guess isn't strictly a decreasing function of n. I guess this is due to rounding of d and the limits of the interval.

Now the square size of the confidence interval should *NOT* be used to calculate a precision weighted average of d. Such an average would obviously be biased down.

update:
Hmm there is no reason for the confidence interval to be centered around d given the squewy destribution of 1/sdhat. Is it.

duced is the upper limit - d and range is upper limit minus lower limit
reg duced range

Source | SS df MS Number of obs = 35
-------------+------------------------------ F( 1, 33) =55765.08
Model | 1.02594697 1 1.02594697 Prob > F = 0.0000
Residual | .000607123 33 .000018398 R-squared = 0.9994
-------------+------------------------------ Adj R-squared = 0.9994
Total | 1.02655409 34 .030192767 Root MSE = .00429

------------------------------------------------------------------------------
duced | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
range | .4999338 .0021171 236.15 0.000 .4956267 .504241
_cons | -.0006627 .0018018 -0.37 0.715 -.0043285 .003003
------------------------------------------------------------------------------


Now that is a really bit t-statistic. The confidence interval is symmetric.

My guess is that the finiteness of the sample is considered when calculating the 90% interval of (change-mean)/sdhat and not when calculating the 90% interval of change(sdhat-sd)/sd^2 (which is even further from normal for a given finite sample but not a distribution written into a software package so let's assume a squewed distribution is normal and insist on the non-normality of the t distribution.)

Thus new guess is the interval is d +/- ((t(.95,n-1))^2 + (1.645d)^2)^0.5
where 1.645= n(.95) and t(95,n-1) is the 95th percentile of the t with n-1 degrees of freedom. This makes no particular sense as it has nothing to do with the 90% interval of a t plus a normal.

And, in fact, my new guess works little better than my old guess.

1 comment:

pj said...

Hmm, that would be very annoying if someone had based their analyses on the confidence intervals being, you know, normal confidence intervals.

Looking back at the data it seems you're right that it is by essentially carrying out the meta-analysis on two entirely separate populations, the drug changes, and the placebo changes, and then subtracting one from the other, that they get their very low estimate of HRSD change.

That is a very odd way of doing things indeed, it is basically assuming that each study is really two separate and entirely unrelated studies, one on how people improve with drugs, and one on how they improve with placebo, so the way to analyse them is to ignore the study design and just try and estimate the pooled effect size for each group (drug and placebo) as if they were unrelated. It partly has an effect because the SDs depend on response, and because sample sizes are skewed towards drug groups in some studies (so the placebo group is much smaller than the drug group).

Taking your SD = change/d approach, and just plugging it into a meta-analysis program (SE weighting, fixed effects) giving an overall effect size of 1.9, it is interesting to note that the fluoxetine trials contribute half as much to the drug analysis (in terms of weighting) compared to the placebo analysis!

But as before, it is also interesting to see that segregating by drug gives effect sizes from 3.6 to .6 (or, given the silly form of this analysis, comparing individual drug groups to pooled placebo subjects 3.8 to -.2).