For the past couple of weeks I’ve been trying to write an article explaining briefly what p-values are and what they really measure. Turns out there are enough subtleties involved that I keep writing and writing and haven’t published anything. So I’ve decided that it’s time for a change of tactic.

I’m going to work my way up to p-values, explaining in detail each of the pieces. Then, when I’m done, I will write a summary that just links back to the longer explanations and hopefully I’ll be able then to summarize the journey and write a more succinct explanation.

This is the first installment of the series, and it deals with the basic idea of probabilities.

For Math Geeks

In these boxes you’ll find formal definitions that are intended to complement the main text. If you are not a math geek, you can safely ignore these.

Let’s start at the very beginning,
a very good place to start.
— Maria
The Sound of Music

In the beginning there were probabilities

The idea behind naïve probabilities is simple. You have a Universe of all possible outcomes of some experiment (sometimes called a sample space and denoted by the greek letter Omega: \(\Omega\)), and you are interested in some subset of them, namely some event (denoted by \(E\)). The probability of event \(E\) occurring is the cardinality (number of elements) of \(E\) over all the possible outcomes (cardinality of \(\Omega\)).


Say you are throwing a pair of dice. How many possible outcomes of this experiment can there be? If you ignore the possible but unlikely event that one of the die will land on its edge, there are 36 possible outcomes. That means that the probability of getting snake eyes (two ones) is 1/36. You could even enumerate all the outcomes and construct a set like {(1, 1), (1, 2), …​ (6, 6)} where each pair \((x, y)\) represents die 1 landing on \(x\) and die 2 landing on \(y\).

I said naïve before because this assignment of probabilities makes a couple of implicit assumptions about the sample space and the events. First of all, it assumes that the sample space is finite. I’m going to completely ignore infinite sample spaces and instead focus on the second implicit assumption: that each outcome is equally likely.

What if some outcomes are more likely than others? For example, what if the dice are loaded? All of a sudden 1/36 doesn’t look like such a good probability assignment for snake eyes.

In the general sense, you don’t have to assign equal probabilities to each of the outcomes. It’s usually just a good starting point to assume that this is the case. But if you know that this is not the case, then starting with equal probabilities is not very smart.

As an example, in the Monty Hall problem, if you second cousin thrice removed is part of the staff and he lets you in that the car is not in door number three, that completely changes the problem. You would never assign \(P = 1/3\) to each of the doors. You know for a fact that the probability of the car being behind door number three is exactly zero.

In a general sense then, probabilities can’t be defined by just counting possible outcomes. They must be defined as general functions that map a set of outcomes to numbers between zero and one. They must, of course, satisfy some special properties.

For Math Geeks

A probability function \(P\) maps a sample space (\( \Omega\)) to a number in the interval \([0,1\)], and satisfies the following three properties:

  • \( P(E) \ge 0 \textrm{ for every } E\)

  • \( P(\Omega) = 1\)

  • \(\textrm{if }E_1, E_2, \ldots\) are disjoint, then \(!P\left(\displaystyle\bigcup_{i=1}^{\infty}E_i\right) = \displaystyle\sum_{i=1}^{\infty}P(E_i)\)

But notice that the previous definition of probabilities (\(|E|/|\Omega|\)) was very handy in the sense that just by knowing the cardinalities we had the appropriate probabilities. If we assign the probabilities unevenly, how do we describe them without having to enumerate each one individually?

This is where probability distributions help. And that will be the subject of the next post.