Wednesday, July 11, 2007

Data Mining Again

The AP reports

The FBI is gathering and sorting information about Americans to help search for potential terrorists, insurance cheats and crooked pharmacists, according to a government report obtained Tuesday.

Records about identity thefts, real estate transactions, motor vehicle accidents and complaints about Internet drug companies are being searched for common threads to aid law enforcement officials, the Justice Department said in a report to Congress on the agency's data-mining practices.

In addition, the report disclosed government plans to build a new database to assess the risk posed by people identified as potential or suspected terrorists.



I say ok, ok, ok, huh ? Analyzing records of "identity thefts, real estate transactions, motor vehicle accidents and complaints about Internet drug companies" makes sense. Here there is a huge amount of information on illegal activity and there may be useful patterns which can be detected.

Data mining to catch terrorists is different. Fortunately, there is a very small amount of data on terrorists mostly because there are few terrorists and possibly because there are terrorist sleepers who are sleeping. There is no way to analyze huge amounts of data to determine apparently innocent behavior associated with terrorism.

It is possible to make a guess about typical terrorist behavior and then have a computer process a huge amount of data based on that guess. The output will be as valid as the original guess, that is, probably worthless.

The above seems totally obvious to me and yet the debate continues.

No comments: