oscarbonilla.com
6Oct/091

The Two Envelopes Problem

A recent thread in reddit about the two envelopes problem reminded me of how unintuitive probabilities can be. There is a fundamental flaw with how the original post worded the problem:

You and I both have envelopes filled with money. My envelope contains either double or half the amount of money that’s in yours. If you want, I’m going to let you switch envelopes. Should you stay, switch, or does it not matter?

Stop right there. Read that again. Does that make sense to you? See if you can set up the experiment in the real world. Grab two envelopes and some money. Take $100 and put them in an envelope, then, uh... what do you do with the other envelope? Put $50 in it? Put $200?

The original poster then went ahead and solved assuming probabilities of 1/2. But I'm not really choosing between two equally likely options. I don't know the probability distributions!

A better wording for the problem (or a different problem if you're pedantic) would be:

There are two envelopes. One contains double the amount of money than the other. You choose one of them, I take the other. Now I offer you a choice between keeping your original choice or switching envelopes with me. Is it to your advantage to switch?

The answer should be obvious, it doesn't matter.

Tagged as: 1 Comment
5May/0922

The Monty Hall problem

After all the positive feedback I got from Visualizing Bayes' theorem, I thought I'd post my explanation of the Monty Hall problem. I was fascinated for a while with this problem because at first it doesn't seem to make any sense. And most of the explanations I've seen have a magic feel to them. I've even seen people with math backgrounds argue against the result.

I think the problem has to do with most of the explanations mixing two very different probability classes. But I'm getting ahead of myself.

The Problem

The problem statement (from Wikipedia) is:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? (Schrock 1990) [1]

If you are like most people, your gut feeling is going to be "it does not matter. There is a 50-50 chance that the car will be in the door I have already selected". Like most people, you would be wrong.

In order to solve this problem, you need to consider two very distinct pieces of information. One is "what state is the world in?", the other is "what events have occurred?". Let me elaborate.

The States of the world

When I deal with probabilities and get confused, I revert to counting. What are all the possible outcomes? What subset of those outcomes am I looking at? So let's look at all the posible states.

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats.

Let's translate that to a simple diagram. We have three doors, let's label them A, B, and C. We also know that there is a car behind one of the doors, it can be behind door A, B, or C. So we have three possible configurations of the world:

picture-22The door with the car has been colored red. The possible states of the world are Sa (meaning the car is behind door A), Sb (car behind door B), and Sc (car behind door C).

Quick, what is the probability of the car being behind door A? Or to put it another way, what is the probability that the world is in state Sa? Note that these two questions are the same, but the latter reminds us of the possible states. It is easy then to see that

p-fig1Now I'm going to ask you a very different question. Assume that instead of playing, you are observing your friend play the game. What is the probability that he will pick door A? Did you say 1/3? Of course you did, there are three doors and he must pick one. However, and this is really important, these probabilities are completely independent of the probabilities above!

Think of it this way. If two doors had cars behind them instead of just one, would that change the number of states the world can be in? Would it change the probability of your friend choosing door A? What if all doors had cars? Now there is just a single state for the world, but your friend can still choose door A with 1/3 probability!

If we now say that A is the event of your friend opening door A, we can easily see that

p-fig2which is just saying each door has an equal probability of being opened.

The trick to solving this problem is that you are dealing with two different classes of probabilities. One is the probability that the world is in a given state (a priori), the other is that a participant (you or Monty) chooses any given door. If you can keep these two classes of probabilities separate, you will be able to easily solve this problem.

Let's see what happens once you have made the choice. Let's assume you chose the door labeled A.

What does Monty do?

Monty is, of course, the host of the game show. And he is trying to screw you. He doesn't want you to get the car. He also happens to know where the car is, or in other words, he knows the state of the world. You on the other hand, are only guessing. You chose door A with probability 1/3, now he gets to choose a door. But there are only two doors left. He must choose either door B or door C.

So how does Monty choose? What is the probability that Monty will open the door labeled B? Here's what the world looks like to Monty:

picture-23Door A has been chosen, he must choose between doors B and C. Were he to choose randomly, each door would be equally probable,

picture-10

The little m's are a reminder that this is Monty's choice, not your original choice. If you wanted to be pedantic you could add P(Am) = 0 since Monty can never choose door A (you chose that one).

But he knows better than to choose randomly, he knows whether we're in Sa, Sb, or Sc! How he chooses will depend on what the state of the world is. If we are in Sa, it doesn't matter what he chooses, he'll choose B or C with equal probability as neither has a car[2]. That is to say "given that we are in state Sa, Monty chooses door B with probability 1/2", in other words:

picture-12If we are in state Sb, Monty will never choose door B, that would be giving away the prize.

picture-14And finally, if we are in state Sc, Monty will always choose door B.

picture-15Your turn

And now it's your turn to make a decision. Do you stay with your original choice of door A? Or do you switch and choose door C?

First of all, does it matter? Is there any difference whether we switch or not? The answer is a resounding YES! Monty gave away information about the state of the world by choosing door B. We can use that information.

What we are really trying to figure out is this: "what configuration is the world in?" Monty knows, he used that information to choose door B. Now we have to ask ourselves, "given that Monty chose door B, what is the probability that the world is in state Sa versus the probability of the world being in state Sc?" Note that we have eliminated Sb because Monty has already opened door B and there was no car!

Here is where we can apply Bayes' formula to get an answer:

picture-16and plugging in the values we have already computed:

picture-17Which means we have a 1/3 chance the car is behind our original choice of door A. Note that this 1/3 probability is different than the original P(A) = 1/3 we had computed. It ends up being the same number, but it is really a different probability!

What about the car being behind door C? What is the probability that we are in state Sc, given that Monty chose door B?

picture-18and plugging in the values we get:

picture-21Which means we have a 2/3 chance the car is behind the other door (door C). Therefore we should switch.

Does it matter if Monty knows?

The only reason we managed to gain any information from Monty's choice of door B is that he knows the state of the world and acts differently depending on what state is the correct one. If Monty didn't know, he would have been choosing randomly between doors B and C and we would not have gained any information (except that Sb is not possible since there was no car behind door B).

If Monty always chooses door B when neither B nor C have the car, we would not have gained any information when he opens door B, but we would gain information if he opens door C (do you see why?).

Note that it is still to our advantage to switch doors, since in this case the probabilities for the states Sa and Sc are the same (1/2). And if we don't know whether Monty knows or not, we might get an advantage from switching.

  1. If you are really pedantic, the mathematically correct definition of the problem is: "Suppose you’re on a game show and you’re given the choice of three doors. Behind one door is a car; behind the others, goats. The car and the goats were placed randomly behind the doors before the show. The rules of the game show are as follows: After you have chosen a door,the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it. If both remaining doors have goats behind them, he chooses one randomly. After Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and the host opens Door 3, which has a goat. He then asks you “Do you want to switch to Door Number 2?” Is it to your advantage to change your choice? (Krauss and Wang 2003:10)" []
  2. This is important, if he doesn't choose either door randomly, you gain information asymmetrically. Keep reading. []
Tagged as: 22 Comments
1May/0947

Visualizing Bayes’ theorem

I recently came up with what I think is an intuitive way to explain Bayes' Theorem. I searched in google for a while and could not find any article that explains it in this particular way.

Of course there's the wikipedia page, that long article by Yudkowsky, and a bunch of other explanations and tutorials. But none of them have any pictures. So without further ado, and with all the chutzpah I can gather, here goes my explanation.

Probabilities

One of the easiest ways to understand probabilities is to think of them in terms of Venn Diagrams. You basically have a Universe with all the possible outcomes (of an experiment for instance), and you are interested in some subset of them, namely some event. Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not. We can then split our universe in two events: the event "people with cancer" (designated as A), and "people with no cancer" (or ~A). We could build a diagram like this:

venn-aSo what is the probability that a randomly chosen person has cancer? It is just the number of elements in A divided by the number of elements of U (the Universe). We denote the number of elements of A as |A|, and read it the cardinality of A. And define the probability of A, P(A), as

eq01

Since A can have at most the same number of elements as U, the probability P(A) can be at most one.

Good so far? Okay, let's add another event. Let's say there is a new screening test that is supposed to measure something. That test will be "positive" for some people, and "negative" for some other people. If we take the event B to mean "people for which the test is positive". We can create another diagram:

venn-bSo what is the probability that the test will be "positive" for a randomly selected person? It would be the number of elements of B (cardinality of B, or |B|) divided by the number of elements of U, we call this P(B), the probability of event B occurring.

eq02

Note that so far, we have treated the two events in isolation. What happens if we put them together?

venn-lastWe can compute the probability of both events occurring (AB is a shorthand for A∩B) in the same way.

eq04

But this is where it starts to get interesting. What can we read from the diagram above?

We are dealing with an entire Universe (all people), the event A (people with cancer), and the event B (people for whom the test is positive). There is also an overlap now, namely the event AB which we can read as "people with cancer and with a positive test result". There is also the event B - AB or "people without cancer and with a positive test result", and the event A - AB or "people with cancer and with a negative test result".

Now, the question we'd like answered is "given that the test is positive for a randomly selected individual, what is the probability that said individual has cancer?". In terms of our Venn diagram, that translates to "given that we are in region B, what is the probability that we are in region AB?" or stated another way "if we make  region B our new Universe, what is the probability of A?".  The notation for this is P(A|B) and it is  read "the probability of A given B".

So what is it? Well, it should be

eq03And if we divide both the numerator and the denominator by |U|

eq051we can rewrite it using the previously derived equations as

bayes-eq1What we've effectively done is change the Universe from U (all people), to B (people for whom the test is positive), but we are still dealing with probabilities defined in U.

venn-justb

Now let's ask the converse question "given that a randomly selected individual has cancer (event A), what is the probability that the test is positive for that individual (event AB)?". It's easy to see that it is

bayes-eq2Now we have everything we need to derive Bayes'  theorem, putting those two equations together we get

bayes-eq3which is to say P(AB) is the same whether you're looking at it from the point of view of A or B, and finally

bayes-eq4

Which is Bayes' theorem. I have found that this Venn diagram method lets me re-derive Bayes' theorem at any time without needing to memorize it. It also makes it easier to apply it.

Example

Take the following example from Yudowsky:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammograms. 9.6% of women without breast cancer will also get positive mammograms. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

First of all, let's consider the women with cancer

example14Now add the women with positive mammograms, note that we need to cover 80% of the area of event A and 9.6% of the area outside of event A.

example21It is clear from the diagram that if we restrict our universe to B (women with positive mammograms), only a small percentage actually have cancer.  According to the article, most doctors guessed that the answer to the question was around 80%, which is clearly impossible looking at the diagram!

Note that the efficacy of the test is given from the context of A, "80% of women with breast cancer will get positive mamograms". This can be interpreted as "restricting the universe to just A, what is the probability of B?" or in other words P(B|A).

Even without an exact Venn diagram, visualizing the diagram can help us apply Bayes' theorem:

  • 1% of women in the group have breast cancer → P(A) = 0.01
  • 80% of those women get a positive mammogram, and 9.6% of the women without breast cancer get a positive mammogram too → P(B) = 0.8 P(A) + 0.096 (1 - P(A)) = 0.008 + 0.09504 = 0.10304
  • we can get P(B|A) straight from the problem statement, remember 80% of women with breast cancer get a positive mammogram → P(B|A) = 0.8

Now let's plug those values into Bayes' theorem

eq06which is 0.0776 or about a 7.8% chance of actually having breast cancer given a positive mammogram.

Tagged as: 47 Comments
17Dec/083

Is that possible?

I normally read the good math bad math blog, where Mark Chu-Carroll debunks crackpots that try to hide their crazyness behind bad math. A while ago, he posted an article called "Why Math?", where he discusses how math allows you to "without ambiguity, prove that something is true or false".

While I agree with Mark, I think it's not completely clear from his post the importance of math's role in deciding what to believe.

One comment in particular inspired me to write this post.

Statements of the form "___ is impossible." strike me as deeply unscientific and dogmatic.

See, math is a formal system for thinking logically. The heart of math lies in being able to prove things from assumptions. So in that sense, if the math is done right, Mark's remark "without ambiguity, prove that something is true or false" is spot on.

However, the assumptions need to be true or your whole argument is invalid, no matter how good the math is.

But going back to "___ is impossible", I think there are different levels of "impossible" that we need to be concerned about.

We can classify impossible things in three categories: logically impossible, physically impossible, and technologically impossible.

The logically impossible

To understand the first category, we need to resort to some philosophical bullshit. Back in the day when people didn't have reddit and used to wander around in their togas thinking whether we could think at all, the great Aristotle proposed what he called his three laws of thought. These form the basis of logic and if you think about them, they make perfect sense. The first law is called the law of identity, and it just says an object is equal to itself. The second law is the law of noncontradiction and it says that nothing can both have a property and not have it at the same time. The third law is the law of excluded middles and it says that objects either have a certain property or they don't

I particularly like Avicena's description of the law of noncontradiction:

Anyone who denies the law of non-contradiction should be beaten and burned until he admits that to be beaten is not the same as not to be beaten, and to be burned is not the same as not to be burned.

This is where math helps. Math allows you to follow arguments to their logical conclusion and either prove or disprove them. It's formalized thinking, and it's full of little shortcuts and notation that allows you to be efficient in reaching your conclusions and in communicating those conclusions to others.

If you find something to be logically impossible, you've found a real impossibility. No way around it. Time travel is the best example of a logical impossibility. If you travel back in time to a place (and time) you have never been before, you would violate the law of noncontradiction (you were both there and not there). That's why sophisticated science fiction writers like Michael Crichton always have the travelers go to a parallel universe.

The physically imposible

Some things however, are not logically impossible, but they violate the laws of physics. This is slightly harder to detect, since our understanding of the laws of physics changes and it has been known to be completely wrong. But if you claim that something that violates the laws of physics is possible, the burden is on you to convince us that the laws of physics are wrong. Not an easy feat!

One example of this would be the claim that you can travel faster than light. Unless you're willing to debunk Poincaré's theory of relativity, you're going to have trouble having anyone believe you.

The technologically impossible

Other things don't violate either the laws of logic (math) or the laws of physics, yet they are beyond our current reach. This is the weakest level of impossibility because it could be just a matter of time before we get there. Almost all current technology would have fallen in this category only a century ago. Just imagine trying to explain the Internet and iPhones to people of the 1800's, they would have certainly shouted "Witchery!".

In conclusion, statements of the form "___ is impossible" don't need to be unscientific and dogmatic.

Tagged as: 3 Comments
25Nov/087

Simple Guide to Complexity Theory

After reading Jeff's terribly misguided post about NP-completeness I thought to myself: "If Jeff, who seems to be completely clueless about complexity theory, can write a blog post about it then so can I."

So without further ado, here's ob's complex guide to simplifying complexity theory.

First of all let's get the definitions out of the way.

Turing machine: An idealized computer, with infinite memory, an infinitely fast cpu, and unlimited bandwidth. It's a mathematical model of computer so it can kick your Core 2 Duo's sorry little ass faster than you can say Pentium. (update: I originally wanted to make the point that bandwidth and cpu speed are irrelevant, but it seems to have confused more than clarified.)

Problem: In complexity theory, a problem is really a decision problem. It has a simple yes/no answer. You don't ask "take this list and sort it and print the last ten items" that's not a decision problem. You do ask "does this program, given this input, produce the last ten items of the sorted input?".

(update: Although the sort problem as outlined here would be a valid decision problem, it is confusing since just changing my "last ten items" to "last C items" where C is a function of n would make it not be a decision problem. Better to use some other example, like "is the number given prime?")

n: Size of the input for the turing machine. E.g. if your turing machine is sorting strings, like in the example above, n is the number of symbols that it takes as input.

T(n): Number of steps (i.e. instructions) that a turing machine must execute to solve a problem with input of size n.

Polynomial Time: When T(n) can be expressed as an equation of n, and possibly some constants, using only addition, subtraction, multiplication, and constant non-negative whole number exponents. E.g. n² is a polynomial while 2ⁿ is not.

Non-deterministic turing machine: A turing machine that can, at every branch point, take all possible branches in parallel. We call this guessing the optimal answer because, since the turing machine takes all branches, it must take the optimal branch too. If any of the parallel turing machines finds an answer, the non-deterministic turing machine stops and gives that answer (this means that it's ok for some branches to never end).

So take your already idealized computer from above and make it able to reproduce itself on every branch point. I.e. it can follow all possible decisions in parallel.

P: The class of problems that can be solved by a turing machine in polynomial time.

NP: The class of problems that can be solved by a non-deterministic turing machine in polynomial time.

Reducing a problem to another: Imagine you have two problems, P1 and P2. You know something about P1 (e.g. that it belongs to NP). Now you want to prove some property of P2, for instance, that it belongs to NP as well.

One technique is to show that if we could solve P2 in polynomial time (i.e. P2 belongs to P), we could use that answer to solve P1 in polynomial time. This would mean that P1 belongs to P which is a contradiction. This technique is called reductio ad absurdum. (update: I completely botched this and commenter Bibi correctly called me on it. Here's an attempt to clarify a reduction: if you want to solve an instance of P2, you convert it to an instance of P1, and then solve P1 to get an answer. Thus, you've reduced P2 to P1. How's that?)

NP-complete: We say a problem is NP-complete if it satisfies two properties: 1) it belongs to NP, and 2) every other problem that belogs to NP can be reduced, in polynomial time, to it.

This means that if we come up with a solution for an NP-complete problem that runs in polynomial time in a deterministic turing machine (i.e. belongs to P), any other problem in NP can be solved in polynomial time by first reducing it (again, in polynomial time) to this problem, and using our solution. We would have proved that P = NP and we would have also won (pinky to corner of mouth) one million dollars.

NP-hard: A problem P1, that is so hard, that even though we can prove number two above, we can't prove number one. This means we know P1 to be just as hard as anything in NP and possibly harder, but we don't know if P1 is in NP or not. P1 is very likely to require exponential time.

(update: I stand by my definition of NP-hard. Although not being able to prove #1 is a sufficient condition, it is not necessary. All problems in P are in NP, and yet we call them P, not NP. Thus, even if all problems that are NP-complete are also NP-hard, it's uninteresting to call an NP-complete problem NP-hard. The point of calling a problem NP-hard is that, it's really hard! Even if it turned out that P = NP, an NP-hard problem might still be hard!)

So there you have it, go impress your friends. I hear that "hey, baby! your love is NP-hard to get" is a very successful pickup line at pubs around M.I.T.

Tagged as: 7 Comments