This is infuriating.

In June 2004, Lucia was convicted of 7 murders and 3 attempted murders by the Court of Appeal in The Hague. She was given a life sentence; in view of the lack of evidence, a perplexing sentence. There are no eye witnesses, there is no direct incriminating evidence. Lucia was never seen in a suspicious situation. She was never found in possession of any of the poisons she was alleged to have used.

So how did they catch this supposed murderer? Why were they even investigating her?

Everything started with an at first glance striking number of incidents (deaths or resuscitations) during Lucia’s shifts at the Juliana Children’s Hospital in the Hague: the JKZ. The run drew attention to her. Seven incidents in a row all in the shifts of one nurse could not possibly be a matter of chance! The services of a former statistician, now professor of Psychology of Law, Henk Elffers, were called in, and the number he came up with must have wiped out all remaining doubt. He figured that the probability that all of seven incidents could have happened during Lucia’s shifts by pure chance was 1 in 6,000,000,000.

So instead of looking at the data to support a theory, they looked at the data to form a theory. This is totally the wrong approach. You can find all sorts of patterns given a large enough data set. That is why seasoned researchers form a theory first and then analyze or gather data in order to test the theory. If you have no theory you’re just doing cargo cult science. As for the 1 in 6,000,000,000 chance, it looks like a case of the Birthday Paradox. Given enough deaths and nurses, the probability of some nurse being present in 7 consecutive deaths is pretty high. Ben Goldacre has more.

Even more bizarre was the staggering foolishness by some of the statistical experts used in the court. One, Henk Elffers, a professor of law, combined individual statistical tests by taking p-values – a mathematical expression of statistical significance – and multiplying them together. This bit is for the nerds: you do not just multiply p-values together, you weave them with a clever tool, like maybe ‘Fisher’s method for combination of independent p-values’. If you multiply p-values together, then chance incidents will rapidly appear to be vanishingly unlikely. Let’s say you worked in twenty hospitals, each with a pattern of incidents that is purely random noise: let’s say p=0.5. If you multiply those harmless p-values, of entirely chance findings, you end up with a final p-value of p < 0.000001, falsely implying that the outcome is extremely highly statistically significant. With this mathematical error, by this reasoning, if you change hospitals a lot, you automatically become a suspect.

Multiplying p-values? Really?