Wednesday, May 24, 2006

Supercomputers and Palmistry

The NSA is attempting to find something useful in the records of which US number was called from which US number and how long the conversation lasted. They never got Qwest data. Someone at the NSA anonymously boasted that they were analysing the worlds largest data set. This implies either that they have more data than they admit or that someone at the NSA is incompetent. However they are analysing a huge data set using extremely powerful computers and the latest available algorithms and estimators.
Given that, I would guess that the program is not only a crime but also a mistake.

I am atracted to an analogy. Let's say they had data on the palm prints of everyone in the USA and attempted to find suicide bombers by careful analysis of life lines. This would be an even larger data set and require very sophisticated algorithms. It would also be idiotic, because all of the analysis would be based on a false assumption.

Now with a large sample of palm prints of suicide bombers and non suicide bombers, one can draw valid inference about the association between creases on the palm and suicide bombing. Of course, I am sure the null hypothesis of no association is valid. However, my point (if any) is that without a sample of suicide bombers it would be impossible to test the null hypothesis that there is nothing to palmistry vs the alternative hypothesis that suddenly ending life lines are correlated with suicide bombing.

If someone at the palm print data mining program guessed that life lines mean something, that guess would never be abandoned. A huge amount of data with no data on the phenomenon of interest can not be used to determine correlates of the phenomenon of interest. The fact that between the NSA guess about pattern typical of al Qaeda operatives and the NSAs suggestion that the FBI investigate a particular Pizza Hut there may be trillions of floating point operations. Torturing a computer will not, sad to say, make a bad guess a good guess.

I think that the considerable patience Americans have with NSA law breaking is, of course, partly based on general contempt for the rule of law, but is also based on a mystical faith in powerful computers. They can come up with answers following processes much too tedious for us to reproduce with pen and paper, but they can't make a guess about al Qaeda sleepers a well supported hypothesis without data on al Qaeda sleepers.

No comments: