Since new posts push old posts down (I believe this would be FILO), comments accompanying old posts get sort of lost in the shuffle and I can't be sure if people read the responses I leave to comments there, so just to clear up some Japanese, I'll make this post.
The title of my blog is Iwakan, in Chinese characters, 違和感, which translates roughly to "feeling out of place." For some pointless but fun (for me) analysis of the word, we break it down into its roots, which for Chinese-root words (漢語) conveniently means by character, since each character roughly represents a single concept, as well as a (possible multiple) reading. I've written the Chinese reading* in parenthesis, just so you can see how the following borrowed readings (音読み) change them a little.
違 - (wéi) i - This character is also read chiga-u, which even people who only know a little Japanese would recognize from Chigaimasu!, meaning wrong or different. My dictionary tells me that you could also use the character 異, (yì) i, for the first character. This has a similar meaning of foreign or different, but is much less common. The only words I can think of that use it are 異人 ijin, meaning "foreigner" or "barbarian" and kotonaru, which isn't very common, but means "to differ."
和 - (hé) wa - This character is important in Japan because it is an old name for Japan, one that dates back to some ancient Chinese tome, but also because it means harmony, which is a central principle here. Social harmony is very important.
感 - (gân) kan - This one is simply feeling. It doesn't have a Japanese reading, at least not one that I've ever heard of being used, but interestingly enough, it is used as a suru-verb, meaning that you can attach the verb suru, to do to the end of it to make it into the verb "to feel," but it is so common that over time this has changed into the easier to say kanjiru, which then conjugates as one would expect a Japanese verb to do.
Now, the word in the URL (I don't remember what this part of a URL is called) is fukakai, written in Chinese characters as 不可解, meaning "incomprehensible." Again, let's break down this word.
不 - (bù) fu - Not. Of the famous "bu yao!"
可 - (kê) ka - Possible.
解 - (jiê)kai - Explain, understand, solve.
So, if you put it all together, you get "not possibly understood," or "incomprehensible."
I thought the names were fitting.
*making the upside-down circonflex is more hassle then it is worth, so you get stuck with thinking that the town I'm implying is up-down, when it's really more like down-up.
Monday, December 27, 2010
Sunday, December 26, 2010
More Math
I said I would get around to another short post, so I'll do that. There should be a Christmas post coming, but I don't have pictures yet, so it wouldn't be as good as it could be.
Anyway, while reading about Bayesian inference on Wikipedia, I came across the Raven paradox, also known as Hempel's paradox. You can just read on there about how it works, or you can read my explanation which will be basically the same thing. So, here goes.
Consider the statement "all ravens are black." As a logical statement, this is the same as "if x is a raven, then x is black." Like all logical statements, it is logically equivalent to its contrapositive, which is "if x is not black, then x is not a raven." If you don't deal with logical statements like that, you should probably think about why statements and their contrapositives are logically equivalent.
Now, if we use the scientific (inductive) method, we can support or disprove this statement (or its contrapositive) with evidence. For example, if we see a raven that is black, we support our statement, and if we see a raven that isn't black, we have disproven the statement. Of course, since the statement and its contrapositive are equivalent, support one is the same as supporting the other and disproving one is the same as disproving the other.
So, what if we see a green apple. It is green, and since an apple is not a raven, it supports the contrapositive "if x is not black, then it is not a raven." So, this observation supports the original statement. But what happens if we start with the (obviously false) statement "if x is a raven, then x is white"? The contrapositive here is "if x is not white, then x is not a raven." Seeing the green apple again supports this contrapositive, so this observation simultaneously supports the (obviously contradictory) statements "all ravens are black" and "all ravens are white."
Weird.
Anyway, while reading about Bayesian inference on Wikipedia, I came across the Raven paradox, also known as Hempel's paradox. You can just read on there about how it works, or you can read my explanation which will be basically the same thing. So, here goes.
Consider the statement "all ravens are black." As a logical statement, this is the same as "if x is a raven, then x is black." Like all logical statements, it is logically equivalent to its contrapositive, which is "if x is not black, then x is not a raven." If you don't deal with logical statements like that, you should probably think about why statements and their contrapositives are logically equivalent.
Now, if we use the scientific (inductive) method, we can support or disprove this statement (or its contrapositive) with evidence. For example, if we see a raven that is black, we support our statement, and if we see a raven that isn't black, we have disproven the statement. Of course, since the statement and its contrapositive are equivalent, support one is the same as supporting the other and disproving one is the same as disproving the other.
So, what if we see a green apple. It is green, and since an apple is not a raven, it supports the contrapositive "if x is not black, then it is not a raven." So, this observation supports the original statement. But what happens if we start with the (obviously false) statement "if x is a raven, then x is white"? The contrapositive here is "if x is not white, then x is not a raven." Seeing the green apple again supports this contrapositive, so this observation simultaneously supports the (obviously contradictory) statements "all ravens are black" and "all ravens are white."
Weird.
Thursday, December 16, 2010
Bayesian at the Moon
I bet you were expecting more Japanese nonsense, but nah. Today I will talk about something that even mathematicians find boring: statistics.
A friend in the math department was presenting her work to some non-math people a while back, which is of course death for trying to say anything meaningful because an invitation to do so always comes with the caveat of "no math." Anyway, her research is on random compositions. A composition is basically just a set of integers that sum to a given integer, I think ordered from biggest part to smallest or something. I'm not an expert on that stuff, but it's sort of irrelevant because isn't it obvious that this has nothing to do with terrorism? I only ask because someone apparently asked if her work could be applied to something about terrorism. Seriously. It's funny that there's this supposed order of intellectuals looking down on each other that goes something like
mathematicians > physicists > chemists > biologists > psychologists > sociologists > fruity humanities type people
because we really try not to look down on people, but then they go and ask questions like that. Incidentally, if anyone has any additions or revisions to that ordering, let me know in the comments. I'm curious.
So I'm almost don't expositioning (expositing? exposing?). Somebody else asked a more relevant question, as to whether she had considered using Bayesian techniques or something like that. Not knowing what Bayesian really means, she didn't know what to say other than she hadn't used them, and since I am apparently the go to guy for statistics (???) she asked me about it later. All I could tell her was what I knew about Bayesian stats, which is this; in Bayesian statistics, you treat population parameters themselves as random variables. I read some more about it, but that's still my basic understanding of it.
You see, there are two approaches to statistics, frequentist (which is what I teach, sort of) and Bayesian (which is not what I teach). We'll see which makes more sense to you by looking at a problem.
Let's say that there's a population that we're measuring something from, and let's say that there's a distribution to that measurement, which I won't make any assumptions about the shape of, or anything, other than that it has finite mean and variance (this is not really much of an assumption, but if I feel like it, I may talk about a distribution that doesn't have these). Let's call that mean M. Normally, we would use the Greek letter mu, but I can't do that on here. Now, to a frequentist, this mean is just a fixed number that is inherent to the population, and we don't know it. If we want to know it, well, we're out of luck, but we can make a good guess at it.
The way that statisticians guess is by taking a random sample and using an appropriate test statistic or estimator built from that random sample. To estimate the population mean, you can imagine that a good estimator is the sample mean. That is, take n measurements from the population, add them up, and divide by n. Let's call that value m. In fact, m is what's called an unbiased estimator because its expected value is the desired value, M.
You can see now what I meant by M not being random, but m being random. M is fixed, to a frequentist, and so doesn't have probabilities associated with it, but m, being built out of a RANDOM sample, has probabilities. It doesn't have just one value, but a range of them, hopefully with those near the actual value of M having higher probabilities. Using m, then we can estimate M by building a confidence interval around it, which depending on our confidence level will probably contain M, though we can't say where. This is what they mean on the news when they say that some proportion is something plus or minus a margin of error. That's a confidence interval.
Alternately, if we wanted to see whether the M for our population is the same as some given mean, like a national average, or an accepted value of some sort, we can perform a hypothesis test. This is slightly harder to understand, but I think highlights a frequentist way of thinking. What you do here is come up with a null hypothesis that always looks like
H_0: M = (given value)
and assume that null hypothesis is true. Now there's a giant theorem, called the Central Limit Theorem, which states that under certain conditions, such as independent observations and large enough sample size, sample means (which are random, remember!) have a normal distribution centered at M, the population mean, and standard deviation (sigma)/(n^.5), where sigma represents the population standard deviation. The exact value there isn't what's important. What matters is that if we assume that our population mean is a given value, we can find the probability of getting a sample mean like ours (m) [nearly] exactly.
Put in common sense terms, if M is actually 0, and we take sample means from the population, most of them will be near 0, but not actually zero. Occasionally, we would get a strange sample, but not that often, so if we get a mean that is "far" from 0, we know that our assumption of the null hypothesis must be wrong. (Statisticians supposedly think that rare events do happen, just not to them). What a hypothesis does is quantify how strange test statistics are by putting them in terms of conditional probabilities.
Now, what do Bayesian statisticians do? Sort of the opposite. They say that a population parameter is a random variable and look at the conditional probability that a hypothesis is true given some evidence. They calculate
P(H|E) = P(E|H)P(H)/P(E) = P(E|H)P(H)/[P(E|H)P(H) + P(E|~H)P(~H)]
which should strike you as weird for a couple reasons. Firstly, we are looking for the probability of a hypothesis being true, such as the hypothesis that the earth orbits the sun. It doesn't make sense because it is intuitively not a random thing. It's just something we don't know whether is true or not, at least to me. Secondly, to do that calculation we have to know P(H), which is the probability that the hypothesis is true. That is, we have to assume a distribution for the truth of H going in. Generally, this is something sort of intuitive, like given two options about which we know nothing, there is a 50% chance of either being true.
Maybe this makes more sense to you, if you are scientifically minded. It's like the scientific method in that you go in thinking that a hypothesis is either true or not, with some probabilities assigned, and you do an experiment, and the results of the experiment either makes it seem more or less likely that the hypothesis is indeed true. Anyway, it's sort of interesting, right?
Since I have a bunch of time, I'll mention one distribution that doesn't meet that finite mean and variance condition from before.
Hopefully that picture loaded. I made it for a homework assignment a while ago using the Mac's built in and extremely handing graphing utility. The red curve is a normal (also called Gaussian) distribution, which is extremely useful, but has finite mean and variance and is only there for comparison. The black curve is a Cauchy distribution, which doesn't have finite mean or variance. Notice that the "tails" on the Gaussian distribution go to zero very quickly, and that those on the Cauchy distribution don't go to zero quite as fast. That is, it has "fat tails." This is an actual term in probability. Fat tailed girls make the world go 'round.
The explanation I've heard for a Cauchy distribution is this. Imagine that you set a flashlight up so that it is pointed down a given distance from the ground, and allow it to rotate freely 180 degrees in such a way that the angle from the vertical line from the flashlight to the ground and the ray of light coming from the light is uniformly distributed. What that just means is that the distribution for the angle is just a box; that is the probability is equal all throughout the 180 degrees and zero elsewhere. What is the distribution of the distance (technically displacement) from the inital position of the light to where the light strikes the ground?
You can see that the probability of the light being perfectly parallel to the ground is 0 (since there are an uncountably infinite number of possible angles), but that there is a considerable probability of the angle being near that, so that the displacement is huge. This is what results in the fat tails. The pdf is given by
f(x) = 1/[pi(1+x^2)]
which math nerds can tell you integrates to 1, so it is a legitimate pdf.
What happens when you try to find the mean, though? To do so, you need to integrate
xf(x) = x/[pi(1+x^2)] ~ 1/x
which doesn't integrate. That is, the integral diverges. In other words, even though 1/x -> 0 as x gets big, it doesn't get small enough fast enough. So, a Cauchy distribution doesn't have a finite mean. For variance, you would need to integrate x^2f(x), which you can imagine diverges even faster, so it doesn't have a finite variance, either. Or any well-defined moments for that matter. Booyeah
A friend in the math department was presenting her work to some non-math people a while back, which is of course death for trying to say anything meaningful because an invitation to do so always comes with the caveat of "no math." Anyway, her research is on random compositions. A composition is basically just a set of integers that sum to a given integer, I think ordered from biggest part to smallest or something. I'm not an expert on that stuff, but it's sort of irrelevant because isn't it obvious that this has nothing to do with terrorism? I only ask because someone apparently asked if her work could be applied to something about terrorism. Seriously. It's funny that there's this supposed order of intellectuals looking down on each other that goes something like
mathematicians > physicists > chemists > biologists > psychologists > sociologists > fruity humanities type people
because we really try not to look down on people, but then they go and ask questions like that. Incidentally, if anyone has any additions or revisions to that ordering, let me know in the comments. I'm curious.
So I'm almost don't expositioning (expositing? exposing?). Somebody else asked a more relevant question, as to whether she had considered using Bayesian techniques or something like that. Not knowing what Bayesian really means, she didn't know what to say other than she hadn't used them, and since I am apparently the go to guy for statistics (???) she asked me about it later. All I could tell her was what I knew about Bayesian stats, which is this; in Bayesian statistics, you treat population parameters themselves as random variables. I read some more about it, but that's still my basic understanding of it.
You see, there are two approaches to statistics, frequentist (which is what I teach, sort of) and Bayesian (which is not what I teach). We'll see which makes more sense to you by looking at a problem.
Let's say that there's a population that we're measuring something from, and let's say that there's a distribution to that measurement, which I won't make any assumptions about the shape of, or anything, other than that it has finite mean and variance (this is not really much of an assumption, but if I feel like it, I may talk about a distribution that doesn't have these). Let's call that mean M. Normally, we would use the Greek letter mu, but I can't do that on here. Now, to a frequentist, this mean is just a fixed number that is inherent to the population, and we don't know it. If we want to know it, well, we're out of luck, but we can make a good guess at it.
The way that statisticians guess is by taking a random sample and using an appropriate test statistic or estimator built from that random sample. To estimate the population mean, you can imagine that a good estimator is the sample mean. That is, take n measurements from the population, add them up, and divide by n. Let's call that value m. In fact, m is what's called an unbiased estimator because its expected value is the desired value, M.
You can see now what I meant by M not being random, but m being random. M is fixed, to a frequentist, and so doesn't have probabilities associated with it, but m, being built out of a RANDOM sample, has probabilities. It doesn't have just one value, but a range of them, hopefully with those near the actual value of M having higher probabilities. Using m, then we can estimate M by building a confidence interval around it, which depending on our confidence level will probably contain M, though we can't say where. This is what they mean on the news when they say that some proportion is something plus or minus a margin of error. That's a confidence interval.
Alternately, if we wanted to see whether the M for our population is the same as some given mean, like a national average, or an accepted value of some sort, we can perform a hypothesis test. This is slightly harder to understand, but I think highlights a frequentist way of thinking. What you do here is come up with a null hypothesis that always looks like
H_0: M = (given value)
and assume that null hypothesis is true. Now there's a giant theorem, called the Central Limit Theorem, which states that under certain conditions, such as independent observations and large enough sample size, sample means (which are random, remember!) have a normal distribution centered at M, the population mean, and standard deviation (sigma)/(n^.5), where sigma represents the population standard deviation. The exact value there isn't what's important. What matters is that if we assume that our population mean is a given value, we can find the probability of getting a sample mean like ours (m) [nearly] exactly.
Put in common sense terms, if M is actually 0, and we take sample means from the population, most of them will be near 0, but not actually zero. Occasionally, we would get a strange sample, but not that often, so if we get a mean that is "far" from 0, we know that our assumption of the null hypothesis must be wrong. (Statisticians supposedly think that rare events do happen, just not to them). What a hypothesis does is quantify how strange test statistics are by putting them in terms of conditional probabilities.
Now, what do Bayesian statisticians do? Sort of the opposite. They say that a population parameter is a random variable and look at the conditional probability that a hypothesis is true given some evidence. They calculate
P(H|E) = P(E|H)P(H)/P(E) = P(E|H)P(H)/[P(E|H)P(H) + P(E|~H)P(~H)]
which should strike you as weird for a couple reasons. Firstly, we are looking for the probability of a hypothesis being true, such as the hypothesis that the earth orbits the sun. It doesn't make sense because it is intuitively not a random thing. It's just something we don't know whether is true or not, at least to me. Secondly, to do that calculation we have to know P(H), which is the probability that the hypothesis is true. That is, we have to assume a distribution for the truth of H going in. Generally, this is something sort of intuitive, like given two options about which we know nothing, there is a 50% chance of either being true.
Maybe this makes more sense to you, if you are scientifically minded. It's like the scientific method in that you go in thinking that a hypothesis is either true or not, with some probabilities assigned, and you do an experiment, and the results of the experiment either makes it seem more or less likely that the hypothesis is indeed true. Anyway, it's sort of interesting, right?
Since I have a bunch of time, I'll mention one distribution that doesn't meet that finite mean and variance condition from before.
Hopefully that picture loaded. I made it for a homework assignment a while ago using the Mac's built in and extremely handing graphing utility. The red curve is a normal (also called Gaussian) distribution, which is extremely useful, but has finite mean and variance and is only there for comparison. The black curve is a Cauchy distribution, which doesn't have finite mean or variance. Notice that the "tails" on the Gaussian distribution go to zero very quickly, and that those on the Cauchy distribution don't go to zero quite as fast. That is, it has "fat tails." This is an actual term in probability. Fat tailed girls make the world go 'round.
The explanation I've heard for a Cauchy distribution is this. Imagine that you set a flashlight up so that it is pointed down a given distance from the ground, and allow it to rotate freely 180 degrees in such a way that the angle from the vertical line from the flashlight to the ground and the ray of light coming from the light is uniformly distributed. What that just means is that the distribution for the angle is just a box; that is the probability is equal all throughout the 180 degrees and zero elsewhere. What is the distribution of the distance (technically displacement) from the inital position of the light to where the light strikes the ground?
You can see that the probability of the light being perfectly parallel to the ground is 0 (since there are an uncountably infinite number of possible angles), but that there is a considerable probability of the angle being near that, so that the displacement is huge. This is what results in the fat tails. The pdf is given by
f(x) = 1/[pi(1+x^2)]
which math nerds can tell you integrates to 1, so it is a legitimate pdf.
What happens when you try to find the mean, though? To do so, you need to integrate
xf(x) = x/[pi(1+x^2)] ~ 1/x
which doesn't integrate. That is, the integral diverges. In other words, even though 1/x -> 0 as x gets big, it doesn't get small enough fast enough. So, a Cauchy distribution doesn't have a finite mean. For variance, you would need to integrate x^2f(x), which you can imagine diverges even faster, so it doesn't have a finite variance, either. Or any well-defined moments for that matter. Booyeah
Wednesday, December 15, 2010
日本語の投稿
日本にいるけん、投稿は日本語での方がいいでしょ?今日、第二故郷帰りをしようと思ってたが、昨晩美絵ちゃんのお母さんが晩ご飯に誘ってくれたから、行かないと思う。好きな奥出雲町は米子市から列車で何時間かかるから、行って、帰ればあそこにいる時間が短いし。美絵ちゃんと飲み会に行くから、明日も無理だそうで、来週にしようかと思ってる。
兎に角、雪が降ってる。近くのコンビニにも行きたくない。顔が凍る感じだ!だけど、ホット•レモンが美味しそう!どうしよう???
日本に着いてからの食べたものリスト:
すき家の牛丼
味噌汁
大好きな納豆ごはん
五目飯(なんじゃらご飯だがニュアンス分からない)
カレーライス(ビーフの、もちろん)
たこ焼き (凧じゃなくて、鮹だ)
インスタント焼きそば
鍋
ポテト•サラダ
美味しそうだけん!
兎に角、雪が降ってる。近くのコンビニにも行きたくない。顔が凍る感じだ!だけど、ホット•レモンが美味しそう!どうしよう???
日本に着いてからの食べたものリスト:
すき家の牛丼
味噌汁
大好きな納豆ごはん
五目飯(なんじゃらご飯だがニュアンス分からない)
カレーライス(ビーフの、もちろん)
たこ焼き (凧じゃなくて、鮹だ)
インスタント焼きそば
鍋
ポテト•サラダ
美味しそうだけん!
Tuesday, December 14, 2010
Natto
I don't have anything new to post, so I'll post an old picture of an old favorite, something I can't get in America, even in Philly's Chinatown, which otherwise is full of great stuff, not all of it Chinese, including Pho restaurants, Japanese candy and general Korean weirdness. The food I have missed more than any other is probably natto, which most people think is disgusting, but I just ate another bowl of today.
It's just fermented beans, basically, and it smells sort of awful, but it tastes really good, at least to me, and is apparently awesome for you. I think in the picture, it has shouyu (soy sauce) and karashi (mustard), which is my favorite combination of things to eat it with, but I have been eating it with daikon oroshi (ground up daikon [I think we use the word daikon, right?]) {grouping symbols}, which is also good.
Speaking of grouping symbols, did you know that there is such a thing as a Lie bracket. It looks just like a bracket, but it is named after Lie. Here is an example:
[Y,L]
You would think that this is a closed interval containing the endpoints Y and L, but maybe it is an binary operator, defined in some crazy way with a sum of partial derivatives, at least if we are talking about a finite-dimensional vector space.
Math is crazy like that; you always think you know a bunch, and then it turns out you don't know anything.
It's just fermented beans, basically, and it smells sort of awful, but it tastes really good, at least to me, and is apparently awesome for you. I think in the picture, it has shouyu (soy sauce) and karashi (mustard), which is my favorite combination of things to eat it with, but I have been eating it with daikon oroshi (ground up daikon [I think we use the word daikon, right?]) {grouping symbols}, which is also good.
Speaking of grouping symbols, did you know that there is such a thing as a Lie bracket. It looks just like a bracket, but it is named after Lie. Here is an example:
[Y,L]
You would think that this is a closed interval containing the endpoints Y and L, but maybe it is an binary operator, defined in some crazy way with a sum of partial derivatives, at least if we are talking about a finite-dimensional vector space.
Math is crazy like that; you always think you know a bunch, and then it turns out you don't know anything.
Monday, December 13, 2010
Winter Trip
Well, I've made it to Japan and am currently messing around in Yonago. There's not much going on, but the trip should involve an "illumination cruise" in Osaka, a trip to the Tottori sand dunes, and playing the role of Santa Claus himself. There's not much to add, but it's going great.
Subscribe to:
Posts (Atom)