# oscarbonilla.com

## Bayes’ Theorem Using Trimmed Trees

I think it does a great job of walking you through multiple applications of Bayes’ theorem. I find it easier to use the equations, but we all learn in different ways so I thought it was worth a link.

Written by ob

January 28th, 2013 at 11:55 pm

Posted in Math

## Mersenne Primes

In 1653, Marin Mersenne, of Mersenne Primes fame, made the bold claim that $2^{67}-1$ was a prime number. That claim remained unchallenged for 250 years – no computers back then – until…

…in 1903, Frank Nelson Cole of Columbia University delivered a talk with the unassuming title “On the Factorization of Large Numbers” at a meeting of the American Mathematical Society.  ”Cole – who was always a man of very few words – walked to the board and, saying nothing, proceeded to chalk up the arithmetic for raising 2 to the sixty-seventh power,” recalled Eric Temple Bell, who was in the audience. “Then he carefully sustracted 1 [getting the 21-digit monstrosity 147,573,952,589,676,412,927]. Without a word he moved over to a clear space on the board and multiplied out, by longhand,

$193,707,721 \times 761,838,257,287$

“The two calculations agreed. Mersenne’s conjecture – if such it was – vanished into the limbo of mathematical mythology. For the first…time on record, an audience of the American Mathematical Society vigorously applauded the author of a paper delivered before it. Cole took his seat without having uttered a word. Nobody asked him a question.”[1]

Now I know where Professor Felton found his inspiration.

You’ve probably heard of Felton (National Academy of Science, IEEE Past President, NRA sustaining member). My advisor told me later that Felton’s academic peak had come at that now-infamous 1982 Symposium on Data Encryption, when he presented the plaintext of the encrypted challenge message that Rob Merkin had published earlier that year using his “phonebooth packing” trap-door algorithm. According to my advisor, Felton wordlessly walked up to the chalkboard, wrote down the plaintext, cranked out the multiplies and modulus operations by hand, and wrote down the result, which was obviously identical to the encrypted text Merkin had published in CACM. Then, still without saying a word, he tossed the chalk over his shoulder, spun around, drew and put a 158grain semi-wadcutter right between Merkin’s eyes. As the echoes from the shot reverberated through the room, he stood there, smoke drifting from the muzzle of his .357 Magnum, and uttered the first words of the entire presentation: “Any questions?” There was a moment of stunned silence, then the entire conference hall erupted in wild applause. God, I wish I’d been there.[2]

1. The Man Who Loved Only Numbers by Paul Hoffman []
2. Auto-weapons by Olin Shivers. []

Written by ob

June 12th, 2011 at 6:18 pm

Posted in Math

## Harry Potter and the Methods of Rationality

with one comment

Have you ever wondered whether some books would be better if the author had rewritten them after they were done? I have wondered that about the Harry Potter books.

I read the original Harry Potter books back when they came out, and even though I found them quite entertaining, they have many flaws. They got progressively worse as their universe became more complicated and they tried to have a deeper story. But it definitely shows in the books that J.K. Rowling did not have a “grand plan” and was basically just making stuff up as she went. In the words of nexes300:

Rowling showed absolutely no planning of the universe past the third book and added things as she liked. Also, Hogwarts is a failure of a school, and Harry and his friends are terrible magic users.

I also disliked the unidimensional characters. In Harry Potter’s universe you are either good or evil, there are no in betweens. Still, the books are entertaining.

Last weekend I had a cold, and I serendipitously got a tip from the brown dragon about a new fan fiction Harry Potter book.

…if you are the type of person who read Harry Potter and thought:

• Set up experiments to test magic
• Witches turn into CATS? …But_…_but_…_but… What about conservation of mass?
• Gringots deals in gold? There are arbitrage opportunities between muggles and the wizarding world!
• I have a Time turner? I can try to prove that P == NP

And when I saw who the author[1] was, I had to read it.

Harry Potter and the Methods of Rationality is the best Harry Potter book that I’ve read. If you enjoyed the original series and you like science I cannot recommend it enough[2] .

1. Yeah, that Yudkowsky,  perhaps you remember him from here. []
2. With just one caveat, the work isn’t finished. Don’t expect the story to end. []

Written by ob

December 23rd, 2010 at 1:42 am

Posted in Books,Math,Reviews

## Doing it wrong

Couldn’t resist posting this one from xkcd:

Written by ob

September 20th, 2010 at 10:14 am

Posted in Humor,Math

Tagged with ,

## Lucia de Berk

with one comment

This is infuriating.

In June 2004, Lucia was convicted of 7 murders and 3 attempted murders by the Court of Appeal in The Hague. She was given a life sentence; in view of the lack of evidence, a perplexing sentence. There are no eye witnesses, there is no direct incriminating evidence. Lucia was never seen in a suspicious situation. She was never found in possession of any of the poisons she was alleged to have used.

So how did they catch this supposed murderer? Why were they even investigating her?

Everything started with an at first glance striking number of incidents (deaths or resuscitations) during Lucia’s shifts at the Juliana Children’s Hospital in the Hague: the JKZ. The run drew attention to her. Seven incidents in a row all in the shifts of one nurse could not possibly be a matter of chance! The services of a former statistician, now professor of Psychology of Law, Henk Elffers, were called in, and the number he came up with must have wiped out all remaining doubt. He figured that the probability that all of seven incidents could have happened during Lucia’s shifts by pure chance was 1 in 6,000,000,000.

So instead of looking at the data to support a theory, they looked at the data to form a theory. This is totally the wrong approach. You can find all sorts of patterns given a large enough data set. That is why seasoned researchers form a theory first and then analyze or gather data in order to test the theory. If you have no theory you’re just doing cargo cult science. As for the 1 in 6,000,000,000 chance, it looks like a case of the Birthday Paradox. Given enough deaths and nurses, the probability of some nurse being present in 7 consecutive deaths is pretty high. Ben Goldacre has more.

Even more bizarre was the staggering foolishness by some of the statistical experts used in the court. One, Henk Elffers, a professor of law, combined individual statistical tests by taking p-values – a mathematical expression of statistical significance – and multiplying them together. This bit is for the nerds: you do not just multiply p-values together, you weave them with a clever tool, like maybe ‘Fisher’s method for combination of independent p-values’. If you multiply p-values together, then chance incidents will rapidly appear to be vanishingly unlikely. Let’s say you worked in twenty hospitals, each with a pattern of incidents that is purely random noise: let’s say p=0.5. If you multiply those harmless p-values, of entirely chance findings, you end up with a final p-value of p < 0.000001, falsely implying that the outcome is extremely highly statistically significant. With this mathematical error, by this reasoning, if you change hospitals a lot, you automatically become a suspect.

Multiplying p-values? Really?

Written by ob

April 9th, 2010 at 4:19 pm

Posted in Math

Tagged with , ,

## Pigeons Beat Students at Probabilities

Interesting. Pigeons outperform humas at the Monty Hall problem. First the pigeons:

Each pigeon was faced with three lit keys, one of which could be pecked for food. At the first peck, all three keys switched off and after a second, two came back on including the bird’s first choice. The computer, playing the part of Monty Hall, had selected one of the unpecked keys to deactivate. If the pigeon pecked the right key of the remaining two, it earned some grain. On the first day of testing, the pigeons switched on just a third of the trials. But after a month, all six birds switched almost every time, earning virtually the maximum grainy reward.

Then the students:

At first, they were equally likely to switch or stay. By the final trial, they were still only switching on two thirds of the trials. They had edged towards the right strategy but they were a long way from the ideal approach of the pigeons. And by the end of the study, they were showing no signs of further improvement.

There is something to be said about our preconceptions and how biased we can be when looking at data. Pigeons are immune to this.

Despite our best attempts at reasoning, most of us arrive at the wrong answer.

Pigeons, on the other hand, rely on experience to work out probabilities. They have a go, and they choose the strategy that seems to be paying off best.

I’ve written about the Monty Hall Problem here.

P.S. In case you missed the joke, look here.

Written by ob

April 4th, 2010 at 11:05 am

Posted in Math

Tagged with , ,

## Introduction

For the past couple of weeks I’ve been trying to write an article explaining briefly what p-values are and what they really measure. Turns out there are enough subtleties involved that I keep writing and writing and haven’t published anything. So I’ve decided that it’s time for a change of tactic.

I’m going to work my way up to p-values, explaining in detail each of the pieces. Then, when I’m done, I will write a summary that just links back to the longer explanations and hopefully I’ll be able then to summarize the journey and write a more succinct explanation.

This is the first installment of the series, and it deals with the basic idea of probabilities.

For Math Geeks
In these boxes you’ll find formal definitions that are intended to complement the main text. If you are not a math geek, you can safely ignore these.

Let’s start at the very beginning,
a very good place to start.
– Maria (The Sound of Music)

## In the beginning there were probabilities

The idea behind naïve probabilities is simple. You have a Universe of all possible outcomes of some experiment (sometimes called a sample space and denoted by the greek letter Omega:$\Omega$), and you are interested in some subset of them, namely some event (denoted by $E$). The probability of event $E$ occurring is the cardinality (number of elements) of $E$ over all the possible outcomes (cardinality of $\Omega$).

$P(E)=\displaystyle\frac{|E|}{|\Omega|}$

Say you are throwing a pair of dice. How many possible outcomes of this experiment can there be? If you ignore the possible but unlikely event that one of the die will land on its edge, there are 36 possible outcomes. That means that the probability of getting snake eyes (two ones) is 1/36. You could even enumerate all the outcomes and construct a set like {(1, 1), (1, 2), … (6, 6)} where each pair (x, y) represents die 1 landing on x and die 2 landing on y.

I said naïve before because this assignment of probabilities makes a couple of implicit assumptions about the sample space and the events. First of all, it assumes that the sample space is finite. I’m going to completely ignore infinite sample spaces and instead focus on the second implicit assumption: that each outcome is equally likely.

What if some outcomes are more likely than others? For example, what if the dice are loaded? All of a sudden 1/36 doesn’t look like such a good probability assignment for snake eyes.

In the general sense, you don’t have to assign equal probabilities to each of the outcomes. It’s usually just a good starting point to assume that this is the case. But if you know that this is not the case, then starting with equal probabilities is not very smart.

As an example, in the Monty Hall problem, if you second cousin thrice removed is part of the staff and he lets you in that the car is not in door number three, that completely changes the problem. You would never assign P = 1/3 to each of the doors. You know for a fact that the probability of the car being behind door number three is exactly zero.

In a general sense then, probabilities can’t be defined by just counting possible outcomes. They must be defined as general functions that map a set of outcomes to numbers between zero and one. They must, of course, satisfy some special properties.

For Math Geeks
A probability function $P$ maps a sample space ($\Omega$) to a number in the interval $[0,1]$, and satisfies the following three properties:

1. $P(E) \ge 0 \textrm{ for every } E$
2. $P(\Omega) = 1$
3. $\textrm{if }E_1, E_2, \ldots \textrm{ are disjoint, then }$
$P\left(\displaystyle\bigcup_{i=1}^{\infty}E_i\right) = \displaystyle\sum_{i=1}^{\infty}P(E_i)$

But notice that the previous definition of probabilities ($|E|/|\Omega|$) was very handy in the sense that just by knowing the cardinalities we had the appropriate probabilities. If we assign the probabilities unevenly, how do we describe them without having to enumerate each one individually?

This is where probability distributions help. And that will be the subject of the next post.

Written by ob

April 3rd, 2010 at 10:17 pm

Posted in Math

Tagged with ,

## The Two Envelopes Problem

with one comment

A recent thread in reddit about the two envelopes problem reminded me of how unintuitive probabilities can be. There is a fundamental flaw with how the original post worded the problem:

You and I both have envelopes filled with money. My envelope contains either double or half the amount of money that’s in yours. If you want, I’m going to let you switch envelopes. Should you stay, switch, or does it not matter?

Stop right there. Read that again. Does that make sense to you? See if you can set up the experiment in the real world. Grab two envelopes and some money. Take $100 and put them in an envelope, then, uh… what do you do with the other envelope? Put$50 in it? Put \$200?

The original poster then went ahead and solved assuming probabilities of 1/2. But I’m not really choosing between two equally likely options. I don’t know the probability distributions!

A better wording for the problem (or a different problem if you’re pedantic) would be:

There are two envelopes. One contains double the amount of money than the other. You choose one of them, I take the other. Now I offer you a choice between keeping your original choice or switching envelopes with me. Is it to your advantage to switch?

The answer should be obvious, it doesn’t matter.

Written by ob

October 6th, 2009 at 10:13 pm

Posted in Math

Tagged with

## The Monty Hall problem

After all the positive feedback I got from Visualizing Bayes’ theorem, I thought I’d post my explanation of the Monty Hall problem. I was fascinated for a while with this problem because at first it doesn’t seem to make any sense. And most of the explanations I’ve seen have a magic feel to them. I’ve even seen people with math backgrounds argue against the result.

I think the problem has to do with most of the explanations mixing two very different probability classes. But I’m getting ahead of myself.

## The Problem

The problem statement (from Wikipedia) is:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice? (Schrock 1990) [1]

If you are like most people, your gut feeling is going to be “it does not matter. There is a 50-50 chance that the car will be in the door I have already selected”. Like most people, you would be wrong.

In order to solve this problem, you need to consider two very distinct pieces of information. One is “what state is the world in?”, the other is “what events have occurred?”. Let me elaborate.

## The States of the world

When I deal with probabilities and get confused, I revert to counting. What are all the possible outcomes? What subset of those outcomes am I looking at? So let’s look at all the posible states.

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats.

Let’s translate that to a simple diagram. We have three doors, let’s label them A, B, and C. We also know that there is a car behind one of the doors, it can be behind door A, B, or C. So we have three possible configurations of the world:

The door with the car has been colored red. The possible states of the world are Sa (meaning the car is behind door A), Sb (car behind door B), and Sc (car behind door C).

Quick, what is the probability of the car being behind door A? Or to put it another way, what is the probability that the world is in state Sa? Note that these two questions are the same, but the latter reminds us of the possible states. It is easy then to see that

Now I’m going to ask you a very different question. Assume that instead of playing, you are observing your friend play the game. What is the probability that he will pick door A? Did you say 1/3? Of course you did, there are three doors and he must pick one. However, and this is really important, these probabilities are completely independent of the probabilities above!

Think of it this way. If two doors had cars behind them instead of just one, would that change the number of states the world can be in? Would it change the probability of your friend choosing door A? What if all doors had cars? Now there is just a single state for the world, but your friend can still choose door A with 1/3 probability!

If we now say that A is the event of your friend opening door A, we can easily see that

which is just saying each door has an equal probability of being opened.

The trick to solving this problem is that you are dealing with two different classes of probabilities. One is the probability that the world is in a given state (a priori), the other is that a participant (you or Monty) chooses any given door. If you can keep these two classes of probabilities separate, you will be able to easily solve this problem.

Let’s see what happens once you have made the choice. Let’s assume you chose the door labeled A.

## What does Monty do?

Monty is, of course, the host of the game show. And he is trying to screw you. He doesn’t want you to get the car. He also happens to know where the car is, or in other words, he knows the state of the world. You on the other hand, are only guessing. You chose door A with probability 1/3, now he gets to choose a door. But there are only two doors left. He must choose either door B or door C.

So how does Monty choose? What is the probability that Monty will open the door labeled B? Here’s what the world looks like to Monty:

Door A has been chosen, he must choose between doors B and C. Were he to choose randomly, each door would be equally probable,

The little m‘s are a reminder that this is Monty’s choice, not your original choice. If you wanted to be pedantic you could add P(Am) = 0 since Monty can never choose door A (you chose that one).

But he knows better than to choose randomly, he knows whether we’re in Sa, Sb, or Sc! How he chooses will depend on what the state of the world is. If we are in Sa, it doesn’t matter what he chooses, he’ll choose B or C with equal probability as neither has a car[2]. That is to say “given that we are in state Sa, Monty chooses door B with probability 1/2″, in other words:

If we are in state Sb, Monty will never choose door B, that would be giving away the prize.

And finally, if we are in state Sc, Monty will always choose door B.

And now it’s your turn to make a decision. Do you stay with your original choice of door A? Or do you switch and choose door C?

First of all, does it matter? Is there any difference whether we switch or not? The answer is a resounding YES! Monty gave away information about the state of the world by choosing door B. We can use that information.

What we are really trying to figure out is this: “what configuration is the world in?” Monty knows, he used that information to choose door B. Now we have to ask ourselves, “given that Monty chose door B, what is the probability that the world is in state Sa versus the probability of the world being in state Sc?” Note that we have eliminated Sb because Monty has already opened door B and there was no car!

Here is where we can apply Bayes’ formula to get an answer:

and plugging in the values we have already computed:

Which means we have a 1/3 chance the car is behind our original choice of door A. Note that this 1/3 probability is different than the original P(A) = 1/3 we had computed. It ends up being the same number, but it is really a different probability!

What about the car being behind door C? What is the probability that we are in state Sc, given that Monty chose door B?

and plugging in the values we get:

Which means we have a 2/3 chance the car is behind the other door (door C). Therefore we should switch.

## Does it matter if Monty knows?

The only reason we managed to gain any information from Monty’s choice of door B is that he knows the state of the world and acts differently depending on what state is the correct one. If Monty didn’t know, he would have been choosing randomly between doors B and C and we would not have gained any information (except that Sb is not possible since there was no car behind door B).

If Monty always chooses door B when neither B nor C have the car, we would not have gained any information when he opens door B, but we would gain information if he opens door C (do you see why?).

Note that it is still to our advantage to switch doors, since in this case the probabilities for the states Sa and Sc are the same (1/2). And if we don’t know whether Monty knows or not, we might get an advantage from switching.

1. If you are really pedantic, the mathematically correct definition of the problem is: “Suppose you’re on a game show and you’re given the choice of three doors. Behind one door is a car; behind the others, goats. The car and the goats were placed randomly behind the doors before the show. The rules of the game show are as follows: After you have chosen a door,the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it. If both remaining doors have goats behind them, he chooses one randomly. After Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and the host opens Door 3, which has a goat. He then asks you “Do you want to switch to Door Number 2?” Is it to your advantage to change your choice? (Krauss and Wang 2003:10)” []
2. This is important, if he doesn’t choose either door randomly, you gain information asymmetrically. Keep reading. []

Written by ob

May 5th, 2009 at 7:27 am

Posted in Math

Tagged with

## Visualizing Bayes’ theorem

I recently came up with what I think is an intuitive way to explain Bayes’ Theorem. I searched in google for a while and could not find any article that explains it in this particular way.

Of course there’s the wikipedia page, that long article by Yudkowsky, and a bunch of other explanations and tutorials. But none of them have any pictures. So without further ado, and with all the chutzpah I can gather, here goes my explanation.

## Probabilities

One of the easiest ways to understand probabilities is to think of them in terms of Venn Diagrams. You basically have a Universe with all the possible outcomes (of an experiment for instance), and you are interested in some subset of them, namely some event. Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not. We can then split our universe in two events: the event “people with cancer” (designated as A), and “people with no cancer” (or ~A). We could build a diagram like this:

So what is the probability that a randomly chosen person has cancer? It is just the number of elements in A divided by the number of elements of U (the Universe). We denote the number of elements of A as |A|, and read it the cardinality of A. And define the probability of A, P(A), as

Since A can have at most the same number of elements as U, the probability P(A) can be at most one.

Good so far? Okay, let’s add another event. Let’s say there is a new screening test that is supposed to measure something. That test will be “positive” for some people, and “negative” for some other people. If we take the event B to mean “people for which the test is positive”. We can create another diagram:

So what is the probability that the test will be “positive” for a randomly selected person? It would be the number of elements of B (cardinality of B, or |B|) divided by the number of elements of U, we call this P(B), the probability of event B occurring.

Note that so far, we have treated the two events in isolation. What happens if we put them together?

We can compute the probability of both events occurring (AB is a shorthand for A∩B) in the same way.

But this is where it starts to get interesting. What can we read from the diagram above?

We are dealing with an entire Universe (all people), the event A (people with cancer), and the event B (people for whom the test is positive). There is also an overlap now, namely the event AB which we can read as “people with cancer and with a positive test result”. There is also the event B – AB or “people without cancer and with a positive test result”, and the event A – AB or “people with cancer and with a negative test result”.

Now, the question we’d like answered is “given that the test is positive for a randomly selected individual, what is the probability that said individual has cancer?”. In terms of our Venn diagram, that translates to “given that we are in region B, what is the probability that we are in region AB?” or stated another way “if we make  region B our new Universe, what is the probability of A?”.  The notation for this is P(A|B) and it is  read “the probability of A given B”.

So what is it? Well, it should be

And if we divide both the numerator and the denominator by |U|

we can rewrite it using the previously derived equations as

What we’ve effectively done is change the Universe from U (all people), to B (people for whom the test is positive), but we are still dealing with probabilities defined in U.

Now let’s ask the converse question “given that a randomly selected individual has cancer (event A), what is the probability that the test is positive for that individual (event AB)?”. It’s easy to see that it is

Now we have everything we need to derive Bayes’  theorem, putting those two equations together we get

which is to say P(AB) is the same whether you’re looking at it from the point of view of A or B, and finally

Which is Bayes’ theorem. I have found that this Venn diagram method lets me re-derive Bayes’ theorem at any time without needing to memorize it. It also makes it easier to apply it.

## Example

Take the following example from Yudowsky:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammograms. 9.6% of women without breast cancer will also get positive mammograms. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

First of all, let’s consider the women with cancer

Now add the women with positive mammograms, note that we need to cover 80% of the area of event A and 9.6% of the area outside of event A.

It is clear from the diagram that if we restrict our universe to B (women with positive mammograms), only a small percentage actually have cancer.  According to the article, most doctors guessed that the answer to the question was around 80%, which is clearly impossible looking at the diagram!

Note that the efficacy of the test is given from the context of A, “80% of women with breast cancer will get positive mamograms”. This can be interpreted as “restricting the universe to just A, what is the probability of B?” or in other words P(B|A).

Even without an exact Venn diagram, visualizing the diagram can help us apply Bayes’ theorem:

• 1% of women in the group have breast cancer → P(A) = 0.01
• 80% of those women get a positive mammogram, and 9.6% of the women without breast cancer get a positive mammogram too → P(B) = 0.8 P(A) + 0.096 (1 – P(A)) = 0.008 + 0.09504 = 0.10304
• we can get P(B|A) straight from the problem statement, remember 80% of women with breast cancer get a positive mammogram → P(B|A) = 0.8

Now let’s plug those values into Bayes’ theorem

which is 0.0776 or about a 7.8% chance of actually having breast cancer given a positive mammogram.

Written by ob

May 1st, 2009 at 7:11 am

Posted in Math

Tagged with