## Saturday, May 13, 2006

More Rock than Ore in the Data Mine

Now that we know exactly how much data the NSA is mining (the largest data set ever assembled) I, for one, want to know what the hell they think they can do with that data. They have huge computers (including ones that actually work because they weren't bought from MZM) and sophisticated algorithms but you can't get blood out of a stone.
They have buzz words "data mining", "link analysis", "pattern analysis" but no hint that they have an algorithm that would actually work but must be kept secret.

People are intimidated by huge data sets, complicated algorithms and powerful computers but there is one basic basic fact in mathematical statistics. You can't determine which patterns are associated with terrorist activity unless you have a large sample of terrorist activity.

Having a huge pile of data to mine does not make it possible to determine the association between patterns and the event of interest if you have only a tiny sample of the event of interest. If the NSA had data on the calling patterns of tens of millions of Al Qaeda operatives, a big computer could be decisive in the war on terror. Mere thousands or even hundreds of known Al Qaeda sleepers might imply a large enough sample with some variance in the dependent variable. In the real world the NSA is comparing the phoning behavior of people who are not known terrorists to the phoning behavior of other people who are not known terrorists.

They may have guessed about what terrorists do with phones, then searched the huge data set for the thousand phones that do the most of that. The FBI reports that this approach seems to be batting 0. Between the guess and the list of the 1000 most Jihadi Pizza huts in the USA there is a lot of computing but computing can not make a bad initial guess better unless the computer has data on the outcome of interest.

Now "link analysis" seems to include building a net out from the few captured Al Qaeda terrorists who have any e-mail or phone contact with the USA. This might be a reasonable thing to do, but there is no reason that the FISA wouldn't authorise it. The FISA court had never denied a warrent when the data mining started. Why would they have banned link tracing starting with someone for whom there legally obtained evidence that gave probably cause to believe that he or she was a member of al Qaeda ?

I think that the illegal program wouldn't pass the FISA court, because, once it is explained, it wouldn't pass the laugh test.