Robert's Stochastic thoughts

Sunday, February 14, 2010

Always click the link -- Kevin Drum

I have very great respect for Kevin Drum. Therefore I am shocked by what happened when I followed his advice.

The links are in this post here

Two years ago, when the FBI was stymied by a band of armed robbers known as the "Scarecrow Bandits" that had robbed more than 20 Texas banks, it came up with a novel method of locating the thieves.

FBI agents obtained logs from mobile phone companies corresponding to what their cellular towers had recorded at the time of a dozen different bank robberies in the Dallas area.

[skip]

I'm only linking to this because it's a pretty good guess that this is similar to the kind of data mining that the NSA is doing as part of its warrantless wiretapping program. (See here and here.)

Now I just assumed that the links would take me to evidence in support of the claim. Instead the first link takes me to this

David Ignatius provides a plausible guess about how the NSA is using the database of calling records they've collected over the past few years:

etc etc etc

The evidence for Drum's guess is the fact that Ignatius guessed something (in my view Ignatius made two different guesses neither of which is similar to the case in Texas) and that he Drum found it plausible.

Evidently due to the passage of time, plausible has matured into a "pretty good guess." Now I work with data sometimes. I would not consider a plausible idea unsupported by any evidence to be a "pretty good guess." I'd say that it takes at least some data to decide about the quality of a guess as opposed to its attractiveness. Oddly this utter total contempt for data appears in a post about data analysis.

The key point is, as Drum advises, always click the links. A claim and a link usually means the link presents evidence for the claim and not that the link presents a weaker form of the claim.

I am honestly shocked at what I read after clicking that link.

OK now I click the second link and read

DATA MINING UPDATE....Yesterday I noted that the NSA's domestic spying program was "a system for identifying criminals by statistical analysis," and suggested that Americans need to decide if they think it's appropriate to launch police surveillance on people simply because they fit a statistical profile. Today, Noah Shachtman points to a USA Today article that says that's exactly what's happening:

The template, officials say, was created from a secret database of phone call records collected by the spy agency. It has been used since 9/11 to identify calling patterns that indicate possible terrorist activity. Among the patterns examined: flurries of calls to U.S. numbers placed immediately after the domestic caller received a call from Pakistan or Afghanistan.

At least there is some evidence. There are two huge problems. First the word "among". There is no claim that the NSA wasn't doing other completely different things too. In fact, I think we can be confident that this was the most defensible of many things they did and therefore the one they discussed. Note the link is presented in support of the claim that this is "similar to the kind of data mining that the NSA is doing as part of its warrantless wiretapping program." That asserts that they are doing one and only one kind of data mining (depends on what the meaning of "the" is). You can't make a "the" claim with "among" evidence. That's a simple gross logical error.

The analogy with the Texas case is ... utter nonsense. In texas the information on the call was that it was near the time and place of a bank robbery. In the NSA case it is that the calls followed a call from Afghanistan or Pakistan. Nothing about the time of that call and, you know, those are two whole countries not one or a few cell phone towers.

I guess it is better to link to evidence which proves that your claim is false than to link to your old guess presenting it as evidence for your new guess. Still shocking though.

The post after the second link contains a link to the post from one day before. The day later post says the guess in the day older post was exactly right. This is odd. The older post posits that the statistical analysis is works "semi reliably". The newer post contains no 0 (zero) nada evidence on the reliability of the NSA algorithms. Drum just assumes that conclusions based on statistical analysis must be semi reliable. Since the second post, anonymous FBI agents have complained that the suggestions they got from the NSA were completely totally unreliable.

Look Drum if you think conclusions by people who process a huge amount of data must be "semi reliable" then what the hell happened to financial markets last year ? Tons of data and bad assumptions can imply completely unreliable conclusions.

Sunday, February 14, 2010

No comments: