Probability is the only satisfactory way to reason in an uncertain world.

Dennis Lindley

An elementary notion in probability is Bayes’ rule, named after the reverend Thomas Bayes, who probably did not discover it. Bayes rule is about flipping conditionals: if I know the probability of event A given that event B has occurred, what’s the probability of B given that A has occurred? That probability is a big neon sign:

On the face of it, Bayes’ rule is a tautology, basically a restatement of the chain rule. But it is in fact fundamental to all kinds of inference, from robots learning about their environment from sensors, to recommender systems for movies, to a lot of machine learning, criminal justice, and medical diagnostics.

It is a depressing fact of modern medicine that most medical professionals incorrectly apply the results of diagnostic tests. Imagine that you have a test that is 99% sensitive (meaning it correctly identifies 99% of people who have a condition) and 99% specific (meaning that it incorrectly identifies people who do not have that condition as having it only 1% of the time). Say you take the test and the result comes back positive, what is the probability that you actually have the disease? Most people (including a frightening number of doctors) will say 99%. But that is missing a key part of the equation: the **prior odds**. That is, if the condition is extremely rare (say 0.1% of the population), then the majority of people who test positive (10/11) do not actually have that condition. In this scenario, somebody who tests positive only has a 1/11 chance of having the disease. Pretty different from 99%, wouldn’t you say? If you are curious, this video gives a good illustration of why the maths works out that way.

Now, Bayes applied to medical diagnoses is nothing new, and indeed the subject of almost any tutorial on Bayes’ rule. The reason why I am bringing it up today is that there is increasing hope placed on antibody testing to help end the COVID-19 pandemic. Some observers are even imagining fairly dystopian scenarios in which the world would be segregated in a two-tier society: those immune to the virus, and those still at risk. They (and others) even imagine extreme measures by which people might be tempted to voluntarily get infected to develop immunity (assuming, of course, that they don’t die).

Those extreme scenarios aside, I thought the topic was a good one for my “Lies, Damned lies and statistics” class, where I basically restate Bayes’ theorem a 100 different ways until students get it. I thought I would share the problem here, since Bayes’ theorem, basic though it is, is still so widely underappreciated in decision-making, even when lives are literally at play.

Let’s say you manage workers that might be exposed to SARS-CoV2, and you would like to make sure to send only workers that are immune to it to those risk environments. A worker shows up with an “immunity passport”, the result of a positive antibody test (T+) that has 99% sensitivity and 95% specificity (that’s optimistic given current tests, but again, this is for argument’s sake). What is the probability that they actually are immune to the virus?

Let’s translate this in math. “99% sensitivity” means that the conditional probability of testing positive given that you are immune is: . Similarly, “95% specificity” means that the probability of testing positive given that you are *not *immune is 0.05. Bayes’ rule (rephrased from the neon sign above), is:

The probability we seek (being immunse given that the test is positive) is a combination of four numbers : and which characterize the test’s accuracy, and crucially, , the probability of being immune in the first place, as well as , the probability of not being immune. Now, obviously, we have the simple relation that , so the last piece of of the puzzle is knowing . In the case of a brand new disease like COVID-19 it’s an unknown variable, which I am going to denote , and for which another name might be **herd immunity**. Now, we can rewrite Bayes’ rule as:

which is a closed-form function of the herd immunity parameter . So, just as before, it’s not just about how accurate the test is; it’s also, crucially, about how prevalent the condition is. To see this, let’s plot this relationship:

We see that for low herd immunity (say, 5%), even a reliable test doesn’t give you a very good guarantee that an immunity marker is synonymous with immunity: the posterior probability is only 49%, so you’d have better odds with a coin flip (50%). Still, 49% is nearly 10 times better than 5%, which shows you how much information the data added to your mental picture: you started from the prior (gray dashed line), and the evidence (here, the antibody test), brought you up to the blue line, which is considerably closer to 1 — though perhaps nowhere near as close as you were hoping.

Indeed, you might be loath to make potentially life-or-death decisions based on a 49% probability. What would it take to bring that probability to 95%, say? Two ways:

- With this kind of test accuracy, once herd immunity passes 50%, the posterior is above 95%.
- If you can’t wait until then, the only way to boost the posterior is to make the test radically more accurate.

Below is a case where the rate of false positives has been brought down from 5% to 1% (that is, the test is now 99% specific):

We see that limiting erroneous diagnoses (false positives) really brings up the posterior probability, particularly for low herd immunity. Yet, for a herd immunity below 20%, that is still not quite enough to reach 95% certainty.

So, like *all *medical diagnostics, and more broadly speaking all detection problems, it’s not only about the accuracy of test, but also the prevalence of the condition you are trying to diagnose. For well-established diseases, the incidence rate is often sufficiently well-characterized that it’s not a limiting factor in decision-making. But in this case, it is: we have to know both how good the test is (e.g. from laboratory experiments) and how prevalent herd immunity is at the time the test is made. We cannot do that without independent information about who has the virus or not (from independent tests such as a PCR-based one).

So I cannot imagine that any rational agent will rely on antibody tests to make these kinds of decisions anytime soon — not until we have much more information about herd immunity. That being said, the Trump administration is notorious for making irrational decisions about nearly everything, so we might very soon be using antibody tests as if they told us everything we want to know. At least, thanks to Bayes, we’ll have the intellectual satisfaction of knowing exactly how dumb that was. In an age of radical anti-intellectualism, that seems to be the only satisfaction left to intellectuals.