Visualizing Bayes’ theorem
I recently came up with what I think is an intuitive way to explain Bayes’ Theorem. I searched in google for a while and could not find any article that explains it in this particular way.
Of course there’s the wikipedia page, that long article by Yudkowsky, and a bunch of other explanations and tutorials. But none of them have any pictures. So without further ado, and with all the chutzpah I can gather, here goes my explanation.
Probabilities
One of the easiest ways to understand probabilities is to think of them in terms of Venn Diagrams. You basically have a Universe with all the possible outcomes (of an experiment for instance), and you are interested in some subset of them, namely some event. Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not. We can then split our universe in two events: the event “people with cancer” (designated as A), and “people with no cancer” (or ~A). We could build a diagram like this:
So what is the probability that a randomly chosen person has cancer? It is just the number of elements in A divided by the number of elements of U (the Universe). We denote the number of elements of A as |A|, and read it the cardinality of A. And define the probability of A, P(A), as

Since A can have at most the same number of elements as U, the probability P(A) can be at most one.
Good so far? Okay, let’s add another event. Let’s say there is a new screening test that is supposed to measure something. That test will be “positive” for some people, and “negative” for some other people. If we take the event B to mean “people for which the test is positive”. We can create another diagram:
So what is the probability that the test will be “positive” for a randomly selected person? It would be the number of elements of B (cardinality of B, or |B|) divided by the number of elements of U, we call this P(B), the probability of event B occurring.

Note that so far, we have treated the two events in isolation. What happens if we put them together?
We can compute the probability of both events occurring (AB is a shorthand for A∩B) in the same way.

But this is where it starts to get interesting. What can we read from the diagram above?
We are dealing with an entire Universe (all people), the event A (people with cancer), and the event B (people for whom the test is positive). There is also an overlap now, namely the event AB which we can read as “people with cancer and with a positive test result”. There is also the event B – AB or “people without cancer and with a positive test result”, and the event A – AB or “people with cancer and with a negative test result”.
Now, the question we’d like answered is “given that the test is positive for a randomly selected individual, what is the probability that said individual has cancer?”. In terms of our Venn diagram, that translates to “given that we are in region B, what is the probability that we are in region AB?” or stated another way “if we make region B our new Universe, what is the probability of A?”. The notation for this is P(A|B) and it is read “the probability of A given B”.
So what is it? Well, it should be
And if we divide both the numerator and the denominator by |U|
we can rewrite it using the previously derived equations as
What we’ve effectively done is change the Universe from U (all people), to B (people for whom the test is positive), but we are still dealing with probabilities defined in U.

Now let’s ask the converse question “given that a randomly selected individual has cancer (event A), what is the probability that the test is positive for that individual (event AB)?”. It’s easy to see that it is
Now we have everything we need to derive Bayes’ theorem, putting those two equations together we get
which is to say P(AB) is the same whether you’re looking at it from the point of view of A or B, and finally

Which is Bayes’ theorem. I have found that this Venn diagram method lets me re-derive Bayes’ theorem at any time without needing to memorize it. It also makes it easier to apply it.
Example
Take the following example from Yudowsky:
1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammograms. 9.6% of women without breast cancer will also get positive mammograms. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
First of all, let’s consider the women with cancer
Now add the women with positive mammograms, note that we need to cover 80% of the area of event A and 9.6% of the area outside of event A.
It is clear from the diagram that if we restrict our universe to B (women with positive mammograms), only a small percentage actually have cancer. According to the article, most doctors guessed that the answer to the question was around 80%, which is clearly impossible looking at the diagram!
Note that the efficacy of the test is given from the context of A, “80% of women with breast cancer will get positive mamograms”. This can be interpreted as “restricting the universe to just A, what is the probability of B?” or in other words P(B|A).
Even without an exact Venn diagram, visualizing the diagram can help us apply Bayes’ theorem:
- 1% of women in the group have breast cancer → P(A) = 0.01
- 80% of those women get a positive mammogram, and 9.6% of the women without breast cancer get a positive mammogram too → P(B) = 0.8 P(A) + 0.096 (1 – P(A)) = 0.008 + 0.09504 = 0.10304
- we can get P(B|A) straight from the problem statement, remember 80% of women with breast cancer get a positive mammogram → P(B|A) = 0.8
Now let’s plug those values into Bayes’ theorem
which is 0.0776 or about a 7.8% chance of actually having breast cancer given a positive mammogram.

Very well explained! I’ve been reviewing the fundamentals of Bayes theory recently to try to get my head around Bayesian networks from a non stats background. Don’t suppose you could point me in the direction of a similarly visual explanation for that, or maybe consider it in a future post?
Rob
1 May 09 at 1:37 pm
Bravo! I actually did the same thing with Venn diagrams to explain it to myself awhile back – I’m glad you took the time to make it clear to others.
Tom
1 May 09 at 1:59 pm
How can it be hard to derive Bayes formula if left and right part is essentially the same (A becomes B, B becomes A).
James Bond
1 May 09 at 2:06 pm
http://yudkowsky.net/rational/bayes
Dmitriy Kropivnitskiy
1 May 09 at 2:11 pm
This was fantastic. Thank you.
bumbledraven
1 May 09 at 2:37 pm
Best explanation of Bayesian, ever, period!!
Doug
1 May 09 at 3:01 pm
Great job man! Visualizing this fascinating theorem was a great idea. I dunno though, I always found the actual examples for Bayes’s Theorem to be misleading… I guess that’s just the nature of statistics in odd contexts though!
CJ Cenizal
1 May 09 at 3:14 pm
popurls.com // popular today…
story has entered the popular today section on popurls.com…
popurls.com // popular today
1 May 09 at 3:22 pm
[...] Visualizing Bayes’ theorem | Ramblings (tags: probability bayes visualization) [...]
links for 2009-05-01 « Blarney Fellow
1 May 09 at 5:17 pm
Dmitriy: Linked in the intro.
Oscar: “Of course there’s the wikipedia page, that long article by Yudkowsky, and a bunch of other explanations and tutorials. But none of them have any pictures.” <– the Java applets/visualizations don’t count as pictures?
gwern
1 May 09 at 6:31 pm
[...] Visualizing Bayes’ theoremFormer Barney Frank staffer now top Goldman Sachs lobbyistThe Venture Boutique: ScalingWhy VC [...]
Visualizing Bayes’ theorem » Michael Ewens
1 May 09 at 9:38 pm
Most beautiful and simple explanation. Thanks a lot
Vinay
1 May 09 at 9:40 pm
Thanks a lot! This was the first time that Bayes actually made sense to me. It’s much easier to understand when visualized.
BD
1 May 09 at 9:47 pm
You are right, memorizing this method gives a hard-to-forget method for rederiving and re-learning Bayes’ rule.
Tom
1 May 09 at 11:41 pm
Thank you for this article!
I really liked the last bit, where you showed the cancer example. That was a good example of “applying the theory shows why you should learn the theory”.
Jens
Jens
2 May 09 at 2:33 am
Nice. I created something similar a while back due to constantly having to explain this in #math on IRC:
http://imagebin.org/47539
The application to mammograms is a nice one though. Bookmarked!
David House
2 May 09 at 3:38 am
This is the absolute best derivation of Bayes theorem I have ever seen, including a course in statistics I did last year. Your post definitely needs to go up on the wikipedia page, under the intuitive reasoning part.
J
2 May 09 at 5:17 am
Hm. Bayesian networks… I don’t know of any “visual explanation” of those. I’ll take a look and see if I can come up with something. Thanks for the comment.
ob
2 May 09 at 10:19 am
If I understand your question correctly, you’re saying “what happens when A and B are the same?”. In that case, P(AB) = P(A) = P(B), and P(A|B) = 1. Which is saying “given that event A occurred, what is the probability of event A occurring?”. It should be pretty obvious that it is 1.
ob
2 May 09 at 10:21 am
True, the Yudkowsky article has some pictures, and if you keep Java off like I do, all you see are the pictures. My main problem with Yudkowsky’s article is that it’s really, really, really, long! If you are looking to just get the gist of what Bayes’ theorem is saying you are not going to read that much.
ob
2 May 09 at 10:24 am
This is great. Very useful.
I don’t want to be a pedant, and I may be wrong, but aren’t these technically Euler diagrams and not Venn diagrams?
Patrick
2 May 09 at 10:58 am
[...] Good explanation of Bayes theorem [...]
Daily Links #57 | CloudKnow
2 May 09 at 12:37 pm
Well, if you consider the fact that the sets are only defined within the Universe, then they are indeed showing all the possible intersections. Thus they are Venn diagrams.
ob
2 May 09 at 6:05 pm
[...] theorem, relating the conditional and marginal probabilities of two random events, this blog post does a great job through the use of [...]
Analytics Team » Blog Archive » Visualizing Bayes’ theorem
2 May 09 at 6:45 pm
[...] Visualizing Bayes’ theorem | Ramblings [...]
My daily readings 05/03/2009 « Strange Kite
3 May 09 at 4:32 am
Wow.
I’ve read quite a few explanations of Bayes’ theorem and this is the first which (in the space of really a few minutes!) has got the point across so succinctly and clearly I now actually understand it.
My thanks.
Clay
4 May 09 at 2:59 am
Brilliant! Thank you!
Abhijith
4 May 09 at 4:44 am
Thanks a lot!!!
Andries Inzé
4 May 09 at 6:58 am
Thank you for re-sparking my love for Discreet Mathematics.
I fondly remember studying this, years ago, but had forgotten the enjoyment of it.
I eagerly await any further of your works.
Perhaps animated…?, perhaps interactive…?
Again Thanks.
Stephen Bell
4 May 09 at 10:52 am
[...] Ramblings Stuff that’s on my mind « Visualizing Bayes’ theorem [...]
The Monty Hall problem | Ramblings
5 May 09 at 7:27 am
This should be made into a visualization app for Google’s new public data search and frankly any data set. Anyone want to work on it with me?
Patrick Koppula
5 May 09 at 12:27 pm
Great job, short, clear and effective – thanks for taking the time to create it!
Avi
5 May 09 at 2:24 pm
The exact problem at the end is a good example of how stats can confuse the lay man. Test accuracy should be reported in the rate of false negatives, rather than true positives. The cost of a large number of false negatives can be great relative to a large number of false positives. However, depending on your situation, you could have a lot of false negatives AND false positives, indicating that your test is not well-correlated for the condition being tested for. In fact, just calculating the correlation coefficient is an important step.
Veggie
5 May 09 at 3:34 pm
Great Job, and excellent example :P | Muy Buen trabajo y excelentes ejemplos!
Edwin Pardo
7 May 09 at 7:24 pm
Great explanation! I teach high school mathematics and am stealing your visualization, example and all, to use in one of my next lessons! :-)
Andrea
10 May 09 at 2:18 pm
hmm… i think the easiest way to understand bayes theorem is to multiply both sides to get..
P(B) P(A|B) = P(A) P(B|A)
both of these are just P(A intersect B) – probability of A and B is the same as the probability of A times the probability of B given A, or the probability of B times the probability of A given B
jeff wu
28 May 09 at 3:10 pm
David Newman’s book, Hippocrates’ Shadow, Simon&Schuster 2008 has an good treatment of of this in relation to medical interventions and clinical trials. He uses a bar graph variant to demonstrate things Baysean. Good, but not as good as your demonstration! His discussion of this topic sent me to the web where I finally and gratefully got to your visualization! Thanks.
AlexPirie
30 May 09 at 6:51 pm
i n really convay to this method of learning of process
ael
5 Jun 09 at 12:26 am
this is freagin awesome explanation. thank u :)
Milz
8 Jun 09 at 6:45 pm
I too teach mathematics and am impressed with your clear explaination.
Well done, and thanks for taking the time to lay it out.
Ken walker
10 Jun 09 at 7:09 am
An addendum that “completes” the Bayes theorem so that the denominator is the same as in the Yudkowsky article would be nice. Graphically, that would be saying that the area of B is the “union” of two sections: 1. the non-overlap, which is areas in B “starting with” the areas that are NOT in A (the ~A part of the universe) i.e. P(B|~A) * P(~A), 2. the overlap, which is areas in B “starting with” the areas that ARE in A i.e. P(B|A) * P(A). It’s more helpful for applications in which the data makes it easier to figure out probabilities of A and relate those to B, than to figure out probabilities of B.
Reedo
11 Jun 09 at 4:24 am
Professor Oscar,
You are a great prof! You set a benchmark for us trying-to-be’s.
A question. When can I write
P(AB/R) = P(A/R) + P(B/R) ?
What conditions need to be imposed on A, B and R?
Thanks Professor!
Tapan Bagchi at the Indian Institute of Technology Kharagpur
Tapan Bagchi
16 Jul 09 at 6:48 pm
thank you very much…very nice article…
desiNerd,
IITKGP
desiNerd
19 Jul 09 at 3:46 am
that’s a really appealing way of expressing Bayes’ theorem
how would the venn-diagram be if the conditional probability of A
didn’t depend on B i.e P(A/B)=P(A) ?
pinki
14 Aug 09 at 2:23 am
Very nice artice article in every meaning of the word. Have you publised elsewhere?
jozsef
14 Aug 09 at 12:31 pm
Great work dude….I was banging my head to develop intuition on Bayes theorem. Now I got the idea clearly…
Thank u very much..
CHARLY
22 Aug 09 at 3:19 am
May want to check the book by Wonnocott and Wonnocott. They have a visual way of explaining Bayes Theorem
Bill
23 Aug 09 at 3:57 pm
Beautiful explanation. I was searching for something that would help me explain it to undergrads…thanks a bunch.
Usman
16 Sep 09 at 10:35 am
This is one of the best explanations I’ve found. Perhaps we can see if I really understand it by trying a real world problem I’m wrestling with.
Here’s the data:
– The odds of a chest pain (CP) being caused by a heart attack is 40%.
– The odds of a CP being caused by other factors (anxiety, depression, etc.) is 60%.
– The odds of a heart attack occurring to a female above age 50 is 80%.
– The odds of a heart attack occurring to a female under age 50 is 20%.
I am presented with a 24 year old female who says she is having chest pain. What is the probability that her chest pain is caused by a heart attack? Is it 0.4 x 0.2 = 0.08?
Also, 78% of patients having heart attacks present with diaphoresis (sweating), so 22% of patients having heart attacks don’t sweat. This female is not sweating, so are the odds of her having a heart attack 0.22 x 0.08 = 0.0176?
Thank you!
Dan Weisberg
2 Oct 09 at 10:47 am
You beat me to it! I got the idea of thinking about Bayes’ theorem by Venn diagrams by reading Reza’s An Introduction to Information Theory.
Truecrimson
11 Oct 09 at 4:26 am
[...] want to explain the calculation by Venn diagrams but someone else beats me to it http://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ There’re links to others’ presentations so pick your favorite. What I think in terms [...]
Bayes and Probability « A Diary
11 Oct 09 at 6:12 am
Can someone help me with this question?
One fifth of customers entering a certain Future
Shop store are under 20 years old. 5% of these under 20 year olds make a purchase over $500,
and 10% of the customers 20 years or older make a purchase over $500. What is the probability
that is a major purchase is made, it was by a person 20 years or older?
Riley
26 Nov 09 at 3:09 pm
Just wanted to convey my thanks.
I am teaching this subject to undergrads at a Chinese university and found your explanations and visuals a huge help.
Mark
8 Dec 09 at 6:58 pm
Good visual explanation. I used it to work out a probability but don’t know if the professor is going to reject it.
Thanks.
Franco
13 Dec 09 at 6:02 pm
this proof was awesome.
viky
9 Feb 10 at 10:57 pm
Really “Great” Job. Keep doing such good job, humanity will be enriched by people like you. Hats off to You!!
Amlan
15 Jun 10 at 6:01 am
Great! It’s subtle and effective. Thanks a lot.
Rishav
24 Aug 10 at 12:28 am
Very nice illustration of how to think of the probabilities.
Well done.
Christian
2 Nov 10 at 8:40 am
i want bayes theorem application
shalu
15 Nov 10 at 8:24 am
Thanks a lot. It was useful for me as a Bioinformatics student.
Good work! Wish you inspiration for making more good stuff!
Dada
22 Nov 10 at 9:12 am
great awesome .no words
munish dadhwal
29 Nov 10 at 9:45 am
U = me
A = Satisfaction with the explanation
p(A) = |A|/|U| = 1
The3rd_Chimp
13 Dec 10 at 3:34 pm
Very helpful demonstration.
For a nearly identical explanation of Bayes theorem using Venn diagrams, see Bolstad’s book, Introduction to Bayesian Statistics.
freethoughtful
19 Dec 10 at 7:17 pm
[...] science I cannot recommend it enough[2] . Yeah, that Yudkowsky, perhaps you remember him from here. [↩]With just one caveat, the work isn't finished. Don't expect the story to end. [↩] [...]
Harry Potter and the Methods of Rationality | oscarbonilla.com
23 Dec 10 at 1:42 am
Excellent explanation!
plarser48
20 Jan 11 at 7:17 pm
Cool! I just tried to do something like this myself using a “sieve” analogy. Check it out HERE. Someone linked to your example and I agree it’s more intuitive, though I commented on one aspect of the sieve I like a little better HERE. Thanks for putting this together!
Hendy
9 Apr 11 at 9:07 pm
very useful. Thank you
antonio
9 May 11 at 5:32 pm
A great blogpost, I just passed this onto a colleague who was doing a little analysis on this. And he in fact bought me breakfast because I found it for him. smile. So let me reword that: Thank you for the treat! But yeah Thanks for taking the time to discuss this, I feel strongly about it and enjoy learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with more details? It is extremely helpful for me. Two thumb up for this blog!
lighting manufacturers
12 May 11 at 8:19 pm
It is so great explanation.Now I have bookmarked your site.Now I regularly visit your site to enhance my understanding.
raghaw
27 May 11 at 4:16 am
This is the most concise and comprehensive explanation of Bayes theorem that I’ve ever read. Thanks a lot!
Joemar
25 Jun 11 at 2:52 pm
Thank you for your post, I enjoyed reading it.
May I ask… what software did you draw your diagrams with?
Carlos
4 Jul 11 at 9:49 pm
I drew the diagrams with OmniGraffle: http://www.omnigroup.com/products/omnigraffle/
ob
5 Jul 11 at 1:45 pm
This explanation is exactly what I was looking for. Hats off to you.
saleem
6 Jul 11 at 6:41 pm
How about a water/hydraulics method? People sometimes speak of ‘probability-mass’ or ‘probability-fluid’, so you could depict the network as a series of tubes changing in size and then the volume of output is the answer.
gwern
19 Jul 11 at 3:30 pm
Great common sense here. Wish I’d thought of that.
Early
22 Jul 11 at 2:50 pm
I’d like to use your explanation in an on-line high school class. Can I get you permission to copy it into our course material, with credit given to you as the source?
David Nelson
14 Aug 11 at 7:07 pm
I hate giving a “me too” answer, but I really must. This is the best explanation of Bayes’ theorem that I’ve ever seen. Great job!
Graham Percival
4 Sep 11 at 7:37 am
outstanding, very useful, will start using with my students
Ricardo
4 Sep 11 at 4:00 pm
[...] Visualizing Bayes’ Theorem by Oscar Bonilla. [...]
Visualizing Bayes’ Theorem « Another Word For It
4 Sep 11 at 5:12 pm
[...] BrainPickings James Burke theFWA (BenTheBodyGuard) ArtDirectorsClub Tatt.ly Zach Holmann Oscar Bonilla Next » « Previous Add a [...]
Perpetual BETA: this sites 2010 to-do list | Fellow Creative
19 Sep 11 at 2:38 am
I drew up a similar explanation about a year ago to explain why, even though there are more poor whites than poor blacks in the world, the perception that most black people are poor is not incorrect. I used current census data at the time, and realized the person registering the complaint on twitter was making the very natural error of inverting probabilities. Here was my illustration: http://cl.ly/V2Y
Jeffrey Horn
22 Sep 11 at 10:31 pm
I should have said this was the natural error of ignoring base rates, not inverting probabilities.
Jeffrey Horn
22 Sep 11 at 10:38 pm
[...] an incredible article! I wish I had this explanation in the high school. As a fully qualified mathematician I want to say [...]
Visualizing Bayes’s theorem | The Personal Blog of Artem Koval, M.Sc.
20 Oct 11 at 10:45 am
I think your explanation is flawed. You said P(A) = | A | / |U|, but that is only true if all elements of A have the same probability. In general that is a false statement.
luis
20 Oct 11 at 11:43 am
>80% of women with breast cancer will get positive mammograms. 9.6% of women
>without breast cancer will also get positive mammograms.
80% have cancer and 9.6% don’t have cancer (when you have positive mammogram). Why doesn’t 80 + 9.6 addup to 100%? What probabilities are remaining?
Damn! I wish I had slept in my math classes
Ganesh Krishnan
21 Oct 11 at 8:46 pm
I agree, I can’t understand how to figure out Pr (B) :(
I’ve got a stats module in my masters and having come from a non mathematical background… I’m struggling a bit.
Mary
22 Oct 11 at 11:57 am
P(B) is the total probability of B, it’s the chance of B happening regardless of whether A happens. Thus, P(B) = P(B|A)P(A) + P(B|¬A)P(¬A).
ob
27 Oct 11 at 11:42 am
80% of women with breast cancer will get a positive mammogram means that P(positive mammogram | woman has cancer) = 0.8.
The remaining 20% is the probability of getting a negative mammogram given that the woman has cancer.
The 9.6 comes from the other part of the population, i.e. women without cancer that also get a positive mammogram.
In general, P(B | A) + P(¬B | A) will sum to 1, but P(B|A) + P(B|¬A) will not as they are unrelated probabilities.
ob
27 Oct 11 at 11:46 am
I didn’t want to overcomplicate the example.
ob
27 Oct 11 at 11:46 am
very useful. thank you
Ann
1 Nov 11 at 12:24 am
Dear Oscar,
A clear presentation – a pleasure to read…
E. Fitzgerald
14 Nov 11 at 4:33 pm
Nicely done. A similar (and equally intuitive and effective) visual explanation of conditional probability and Bayes theorem based on Venn diagrams can be found in Bolstad’s textbook on Bayesian Statistics.
freethoughtful
25 Nov 11 at 10:41 pm
Very usefull article
Wiki page must have link of Such a good and easy to under stand article.
Thank you.
Hemant Patel
8 Dec 11 at 2:27 pm
Outstanding! I’ve been trying to get my head around this for some time and your explanation made everything fall into place. Nicely done. Thanks for posting this.
Mark
22 Jan 12 at 8:14 am
If this is the basis of Bayes Theory than we can simplify all of this to say that, in any given circumstance, if no explicit variables are defined than the outcome of the equation, whether inversely or conversely calculated always equals an equal percentage of the number of circumstances… Or any equal division of the given number of arguments.
In your cancer example, you give explicitly defined variables which makes is possible to calculate an explicit answer.
I take this to mean that Bayes Theory is simply a definition of how to form the equation.
Am I right in my assumption?
BCPower
9 Feb 12 at 5:20 am
Furthermore, the simplified formula that I mentioned above could, consequently be compounded by the number of given circumstances. Assuming no variables are defines as the graphic at the top suggests, it could be said that the Bayes formula would compound in similar, possible equations much like possible combinations of a lock. That is to say, if given a combination lock with 4 dials, each with 2 numbers, your number of possible combinations are mathematically derived because we are in fact given two variables. The actual answer is not what I’m after but merely to say that it is, in fact, calculable.
To bring my previous derision full circle, we were given two variables. The first was A and the second was U.
My assertion is that Bayes Theory is simple a rule-set by which to write a formula. A formula for a formula if you will. :o)
Kind of redundant in a matter of thinking but a useful teaching tool none the less.
BCPower
9 Feb 12 at 5:49 am
Thanx Oscar , that was really intuitive ..
Sree
18 Mar 12 at 2:01 am
Thanks for the explanation
Masud
16 Apr 12 at 5:10 am
Thanks! You proved how truly understanding something means being able to teach it in a way that grade school kids can understand :)
Chris
8 May 12 at 4:40 am