Monday, July 18, 2005

Confidence intervals

Warning: This is another in my occasional series demystifying statistical terminology. If you don't like stats and don't want to learn, skip on down.

In a discussion about ANWR in the comments, I cited this study, in which the USGS estimates that:
The total quantity of technically recoverable oil within the entire assessment area is estimated to be between 5.7 and 16.0 billion barrels (95-percent and 5-percent probability range), with a mean value of 10.4 billion barrels. Technically recoverable oil within the ANWR 1002 area (excluding State and Native areas) is estimated to be between 4.3 and 11.8 billion barrels (95- and 5-percent probability range), with a mean value of 7.7 billion barrels (table 1).
And I pointed out that we use about 18 million barrels a day, which works out to approximately 7 billion barrels per year, about the same amount of oil that the government estimated was technically recoverable in ANWR.

The commenter responded in part by noting this explanation:
"The 95-percent probability level refers to 19 in 20 chances; the 5-percent probability level refers to 1 chance in 20 that the amounts shown will be AT LEAST that large."
I'm not sure what the point was, but let's do a quick backgrounder on confidence limits.

First, let me say that confidence limits suck, because even experts have trouble explaining them.

Technically, what the 95% limit provides is a value above (or below) which one value would be says is that if you sampled from a population which actually had 11.8 billion barrels, you'd get a sample which suggested that only 7.7 billion were available about 5% of the time. Similarly, a population with only 4.3 billion barrels would give a sample suggesting 7.7 billion about 5% of the time.

That's different from claiming that there's a 5% probability that this sample was drawn from a population with very many more or fewer than 7.7 billion barrels. It doesn't seem like it should be, but it really is. Explaining why involved Bayes' theorem and all sorts of math that's hard to draw.

If the population really had more than 11.8 billion barrels, less than 1 in 20 times would you randomly draw a set of samples with as little oil as found in the samples drawn by the USGS. But if it had about 7.7 billion barrels, it's very likely that you'd get about as much oil as they got in their samples. If they took good samples and a more representative set of data, it's likely that they did very well in estimating the volume of oil available.

Now, what if they're wrong?

If they blew it, they're as likely to be too high as too low in their estimates. You can't go build roads and pipelines and oil derricks on the assumption that you're out in the tails of the distribution, because you can't know which tail you'd be in. Geologists are pretty smart folks, so they probably weren't sinking cores in areas that are unlikely to have oil, they were choosing areas that were good prospects. If they were wrong, it's very likely to be an overestimate, if their samples were biased. But they know what they're doing, so they didn't bias their samples.

There's right around 8 billion barrels of recoverable oil in ANWR. Absent evidence, no one should base policy around any other assumption.