Friday, February 11, 2005

On Hypotheses

Elk City Reservoir
Today the Mammal Group here at the KU Natural History Museum read a paper by Peter Lipton. (Unfortunately Science hides things behind it's subscription wall, so unless you subscribe, you'll just have to believe me from here on.)
The paper itself was not spectacular. The question is interesting, though.

In the case of “accommodation,” a hypothesis is constructed to fit an observation that has already been made. In the case of “prediction,” the hypothesis, though it may already be partially based on an existing data set, is formulated before the empirical claim in question is deduced and verified by observation. Well-supported hypotheses often have both accommodations and successful predictions to their credit. Most people, however, appear to be more impressed by predictions than by accommodations.

Why? Well, Lipton leads us through various philosophical conundra, involving twins and comets and such. Lipton first poses and rejects three possible explanations of our preference for prediction rather than accommodation. He rejects criticism of accommodation for being
ad hod theory, for being untested, and for being ambiguous in its generality.

After what felt like ten pages (but is actually one), we get to the real arguments against accommodation. One is (I think) petty: you can choose the most specific prediction of a theory to test and gather data, while accommodation sticks you with whatever data are at hand. The other argument is fudging. A prediction relies on
a priori hypotheses, any effort to “fudge” the theory is likely to fail. Accommodation makes fudging easy.

I think pedantic arguments are to be expected from philosophers, but biologists want something different, so we talked about Bayes rule and model building.

Ecology is a statistical science. We study populations, and we test hypotheses statistically. In statistics, you form a null hypothesis, a hypothesis where nothing interesting happens. For instance, if you are interested in a relationship between two random variables, you test the computed correlation coefficient against the null hypothesis that there is no correlation, coefficient = 0.

But what if you look through a bunch of scatter plots and choose the pairs of variables that look significantly correlated? Now your naive expectation is not 0 correlation. You rejected,
a priori, anything close to that. By chance, a random sample from two uncorrelated measurements would give you significant correlations once in 20 attempts. If you remove anything that doesn't look correlated, you would expect some different fraction of false positives, but it's hard to know what that value is, and that's the value you use to assess statistical significance.

The point is that you have to perform statistical analyses on predictions, not accommodations.

The argument can be formalized with Baye's rule. Without getting into the math, Baye's rule says that your degree of belief in a hypothesis after you gather data (posterior probability) is based on the interaction of
  • your initial belief in the hypothesis (the prior probability)
  • the probability of your experimental results if your hypothesis is true
  • the probability of your experimental results if your hypothesis is false

Accommodation uses the results to generate the hypothesis, so you can't estimate the prior probability, and at best you have to say that the odds of your results coming from your hypotheses are 50:50. As a result, your posterior probability is the same as your prior, and you don't know what either is.

Why is this important? Well, intelligent design creationism is a massive exercise in accommodation. It's impossible to generate predictions, and the
post hoc explanations IDC offers are pure accommodations, with no principles driving the result. And IDC, like any attempt at integrating the unnatural or supernatural into science, can never be anything more. We can't construct testable theories about a supernatural or unnatural intelligence by the definitions of supernatural and unnatural.

Why else? It helps explain my instant skepticism about the
HIV/OPV connection. That's a reasonable hypothesis built as an accommodation and an extension of existing theory. It generated testable predictions. Those predictions are uniformly falsified. Each prediction is rejected. You can question the details, but nothing backs the hypothesis, nor can we categorically reject it. Every time new evidence arrives undermining the hypothesis, Hooper proposes an ever grander conspiracy. First the actual doctors in the field were covering their asses. Then their colleagues and friends were covering for them. Then the entire scientific and medical world were covering for what would be, at worst, an honest mistake.

Each broadening of the conspiracy is Hooper's effort to accommodate new data into his theory. Each time, his theory becomes more complex, and each new twist has no inherent basis in the theory, making it a new accommodation every time. The problem just compounds.

Now, does that make Hooper wrong? No. I think that, whatever the truth of his initial claim about OPV, he is almost definitely wrong about any broad scientific or medical conspiracy to cover up for Koprowski. More on that later.

Don't Think Twice, It's All Right” by Bob Dylan & The Band from the album Before the Flood (1974, 4:37).