Thursday, December 16, 2010

Bayesian at the Moon

I bet you were expecting more Japanese nonsense, but nah. Today I will talk about something that even mathematicians find boring: statistics.

A friend in the math department was presenting her work to some non-math people a while back, which is of course death for trying to say anything meaningful because an invitation to do so always comes with the caveat of "no math." Anyway, her research is on random compositions. A composition is basically just a set of integers that sum to a given integer, I think ordered from biggest part to smallest or something. I'm not an expert on that stuff, but it's sort of irrelevant because isn't it obvious that this has nothing to do with terrorism? I only ask because someone apparently asked if her work could be applied to something about terrorism. Seriously. It's funny that there's this supposed order of intellectuals looking down on each other that goes something like

mathematicians > physicists > chemists > biologists > psychologists > sociologists > fruity humanities type people

because we really try not to look down on people, but then they go and ask questions like that. Incidentally, if anyone has any additions or revisions to that ordering, let me know in the comments. I'm curious.

So I'm almost don't expositioning (expositing? exposing?). Somebody else asked a more relevant question, as to whether she had considered using Bayesian techniques or something like that. Not knowing what Bayesian really means, she didn't know what to say other than she hadn't used them, and since I am apparently the go to guy for statistics (???) she asked me about it later. All I could tell her was what I knew about Bayesian stats, which is this; in Bayesian statistics, you treat population parameters themselves as random variables. I read some more about it, but that's still my basic understanding of it.

You see, there are two approaches to statistics, frequentist (which is what I teach, sort of) and Bayesian (which is not what I teach). We'll see which makes more sense to you by looking at a problem.

Let's say that there's a population that we're measuring something from, and let's say that there's a distribution to that measurement, which I won't make any assumptions about the shape of, or anything, other than that it has finite mean and variance (this is not really much of an assumption, but if I feel like it, I may talk about a distribution that doesn't have these). Let's call that mean M. Normally, we would use the Greek letter mu, but I can't do that on here. Now, to a frequentist, this mean is just a fixed number that is inherent to the population, and we don't know it. If we want to know it, well, we're out of luck, but we can make a good guess at it.

The way that statisticians guess is by taking a random sample and using an appropriate test statistic or estimator built from that random sample. To estimate the population mean, you can imagine that a good estimator is the sample mean. That is, take n measurements from the population, add them up, and divide by n. Let's call that value m. In fact, m is what's called an unbiased estimator because its expected value is the desired value, M.

You can see now what I meant by M not being random, but m being random. M is fixed, to a frequentist, and so doesn't have probabilities associated with it, but m, being built out of a RANDOM sample, has probabilities. It doesn't have just one value, but a range of them, hopefully with those near the actual value of M having higher probabilities. Using m, then we can estimate M by building a confidence interval around it, which depending on our confidence level will probably contain M, though we can't say where. This is what they mean on the news when they say that some proportion is something plus or minus a margin of error. That's a confidence interval.

Alternately, if we wanted to see whether the M for our population is the same as some given mean, like a national average, or an accepted value of some sort, we can perform a hypothesis test. This is slightly harder to understand, but I think highlights a frequentist way of thinking. What you do here is come up with a null hypothesis that always looks like

H_0: M = (given value)

and assume that null hypothesis is true. Now there's a giant theorem, called the Central Limit Theorem, which states that under certain conditions, such as independent observations and large enough sample size, sample means (which are random, remember!) have a normal distribution centered at M, the population mean, and standard deviation (sigma)/(n^.5), where sigma represents the population standard deviation. The exact value there isn't what's important. What matters is that if we assume that our population mean is a given value, we can find the probability of getting a sample mean like ours (m) [nearly] exactly.

Put in common sense terms, if M is actually 0, and we take sample means from the population, most of them will be near 0, but not actually zero. Occasionally, we would get a strange sample, but not that often, so if we get a mean that is "far" from 0, we know that our assumption of the null hypothesis must be wrong. (Statisticians supposedly think that rare events do happen, just not to them). What a hypothesis does is quantify how strange test statistics are by putting them in terms of conditional probabilities.

Now, what do Bayesian statisticians do? Sort of the opposite. They say that a population parameter is a random variable and look at the conditional probability that a hypothesis is true given some evidence. They calculate

P(H|E) = P(E|H)P(H)/P(E) = P(E|H)P(H)/[P(E|H)P(H) + P(E|~H)P(~H)]

which should strike you as weird for a couple reasons. Firstly, we are looking for the probability of a hypothesis being true, such as the hypothesis that the earth orbits the sun. It doesn't make sense because it is intuitively not a random thing. It's just something we don't know whether is true or not, at least to me. Secondly, to do that calculation we have to know P(H), which is the probability that the hypothesis is true. That is, we have to assume a distribution for the truth of H going in. Generally, this is something sort of intuitive, like given two options about which we know nothing, there is a 50% chance of either being true.

Maybe this makes more sense to you, if you are scientifically minded. It's like the scientific method in that you go in thinking that a hypothesis is either true or not, with some probabilities assigned, and you do an experiment, and the results of the experiment either makes it seem more or less likely that the hypothesis is indeed true. Anyway, it's sort of interesting, right?

Since I have a bunch of time, I'll mention one distribution that doesn't meet that finite mean and variance condition from before.

Hopefully that picture loaded. I made it for a homework assignment a while ago using the Mac's built in and extremely handing graphing utility. The red curve is a normal (also called Gaussian) distribution, which is extremely useful, but has finite mean and variance and is only there for comparison. The black curve is a Cauchy distribution, which doesn't have finite mean or variance. Notice that the "tails" on the Gaussian distribution go to zero very quickly, and that those on the Cauchy distribution don't go to zero quite as fast. That is, it has "fat tails." This is an actual term in probability. Fat tailed girls make the world go 'round.

The explanation I've heard for a Cauchy distribution is this. Imagine that you set a flashlight up so that it is pointed down a given distance from the ground, and allow it to rotate freely 180 degrees in such a way that the angle from the vertical line from the flashlight to the ground and the ray of light coming from the light is uniformly distributed. What that just means is that the distribution for the angle is just a box; that is the probability is equal all throughout the 180 degrees and zero elsewhere. What is the distribution of the distance (technically displacement) from the inital position of the light to where the light strikes the ground?

You can see that the probability of the light being perfectly parallel to the ground is 0 (since there are an uncountably infinite number of possible angles), but that there is a considerable probability of the angle being near that, so that the displacement is huge. This is what results in the fat tails. The pdf is given by

f(x) = 1/[pi(1+x^2)]

which math nerds can tell you integrates to 1, so it is a legitimate pdf.

What happens when you try to find the mean, though? To do so, you need to integrate

xf(x) = x/[pi(1+x^2)] ~ 1/x

which doesn't integrate. That is, the integral diverges. In other words, even though 1/x -> 0 as x gets big, it doesn't get small enough fast enough. So, a Cauchy distribution doesn't have a finite mean. For variance, you would need to integrate x^2f(x), which you can imagine diverges even faster, so it doesn't have a finite variance, either. Or any well-defined moments for that matter. Booyeah

8 comments:

Sarah Mac said...

Just have to say that I'm thrilled that this is the first math related post that MAKES SENSE TO ME.

kilgore said...

I skipped most of the part after the theoretical hierarchy of academics, which I do agree with, given the condition that it is a closed loop; fruity humanities folks (especially English and studies about a particular culture, like African-American Studies, French, etc. The worst is a specialist of a combination of those two, such as a specialist in French literature) look down at everyone else.

Hot Topologic said...

I'll try to throw up another short post that might make sense to people.

Also, talking about the hierarchy of academics reminds me of a hilarious passage from Cryptonomicon (which everyone should read even if they don't like SF in general) where our programmer hero is subjected to a dinner party with his wife's colleagues in the humanities.

PopsArmstrong said...

Yes! I remember that passage from Cryptonomicon! It was both hilareous and slightly disturbing to me as a loyal academic. And of course, everybody knows that all biology is really just chemisty... I'll leave it to others to continue.

the j link said...

Delicious! You remind me that I am 'out of the loop' for math humor - all I've got is Dr. Lee's "Don't worry, it's statistics... Not real math. Assume as you wish."

Hot Topologic said...

Did you ever get him holding up one hand, saying that if that is math, then statistics is the pinky finger? If you lose the pinky, then you'd still have a hand, or something like that? Any of his stories of spotting ghosts on campus?

the j link said...

No sir, we never got the "Hand of Math" analogy, but that's a good one. Plenty of talk about the positive connotation of poo showers, though. Ghosts at physical plant, too, which is oddly believable... One day he spent at least 30min explaining how [as you and I know] the "l" sound doesn't really exist in East Asian languages, and his problem with asking for "the bill" at a restaurant - the closest one can really get is "biru", and the waiter thought he was requesting a beer... In the end, we didn't get many stories because he spent most of his time telling a total jerk to shut up and come to office hours if he wanted to make snide commentary and rude harassment "jokes". In his classic infinitely-polite manner, of course.

Unrelatedly, I felt like getting to your blog in a roundabout way, and I think you'd be happy to know you're result #9 in a Google search for "fukakai". Captain Popular! But I still don't know what the word means...

Hot Topologic said...

Ah, yes, the infamous bill/biru story. He told us that one, too.