Scientific Method and Probability

In previous episodes, I have discussed the principles of the scientific method: Observability, Falsifiability, and Reproducibility. I have given some examples of those principles using scientific theories that can give predictions of certainty such as Newtonian mechanics. But in recent years, more and more scientific discoveries are starting to making probabilistic predictions instead of predictions of certainty, especially in fields such as biology, sociology, and psychology. You may have heard of p-value before. P-value is the cornerstone of many research papers. But unfortunately, there are many misconceptions about and misuses of the p-value, even amongst the researchers using it in their publication. In 2005, a Greek researcher John Ioannidis has published a paper named “Why Most Published Research Findings Are False”. It has noted that the fallacy of p-value can leads to false conclusions, even though researches follows the industry standard research methodologies and passes the peer review. 15 years since the paper published, it has over 9000 citations. More and more researchers in the field start to advocate for a methodology reform to overcome what they call “the reproducibility crisis”. So what is p-value then? How does it, or does it not fit into the traditional scientific method formalized by Karl Popper?

Before I get to the p-value, let’s first try to understand what probability is as defined in mathematics. Statistics has been so ingrained in our everyday language, even those who have not formally learned statistics in school would have some understanding when a news anchor proclaims a political candidate has a 70% chance of winning an election. But what exactly does it mean? How was the probability measured? Did they created 100 parallel universes with identical conditions as the present and waited it out and saw that in 70 of them, the candidate won? If the candidate does not win the election does it mean the prediction was wrong? Can we even know if the prediction was wrong if there is only one future that we live in? Mathematical concepts sometimes make into our daily language but unfortunately, more often than not, the true meaning of those concepts was not preserved. For example, the word exponential growth, in mathematics, means that the growth can be modeled by an exponential function. But when we used it in English, we just mean something grows really fast.

So, what does probability mean, mathematically? Consider a bag of 10 marbles, 6 reds, 4 blues. I am gonna use this example a lot throughout this episode. You draw one marble out at random, what is the chance of getting a red marble, and what is the chance of getting a blue? I hope that it is clear that the chance of drawing red marble is 6 in 10, 60%, and the chance of drawing blue is 4 in 10, 40%. Here is the mathematical definition of probability: The probability of a given outcome is the ratio between the number of the given outcome occurrence to the total number of events when we repeat the excise infinite amount of times. In the case of a bag of marbles, the events are drawing random marble from the bag. Even though the color of the marble we drew maybe different, but to our knowledge, we can not know what the color of the marble is before drawing it out, therefore, we consider each draw to be identical. If we want to calculate the probability of drawing a red marble, we just need to keep drawing marbles from the bag, examine its color, put it back then repeat the process. The probability of drawing a red marble is then the ratio between the number of red marbles drawn and the total number of marbles drawn when we repeat the experiment an infinite amount of times. As you have noticed, this process requires repeating the experiment an infinite amount of times, which is, of course, not possible in reality. Also, the computation of probability does require taking a ratio of an infinitely large number to an infinitely large number, but I will not go into mathematical limits here.

In the case of the marbles, if we assume that all marbles have an equal chance of been drawn from the bag, which is a very very important assumption we make and I will examine its implication later. This allows us to analytically calculate the probability using limits, and the probability of getting a red marble is 60% chance, or 6 / 10, the number of red marbles over the total number of marbles. In the real world, we can never actually measure the true probability of anything as it requires us to perform the experiment for an infinite amount of time. But fortunately, by the law of large numbers, proved by mathematician Jacob Bernoulli in 1713, we know that when we have a large enough number of experiments, the empirical probability will converge to the true probability. But unfortunately for us, how large the number of experiments has to be, depends on the true probability, which in most cases we do not know, but in general, the larger the number of samples we have, the closer the result will be to the true probability. Using the marble example, let’s say we draw marbles out of the bag 100 times and saw 50 red marbles, then our empirical probability of red marbles is 50%. If we try the experiment 1000 times, we may see 610 red marbles, our empirical probability of red marbles is then 61%. In an oversimplified statement, the more experiments we conduct, the closer the empirical probability will be to the true probability.

We see, there are actually three different types of probabilities: analytic probability, the probability we calculate base on the assumption of the equal chances of each event, the empirical probability, the probability calculated based on our empirical data, and true probability, that the probability we want but unfortunately not obtainable. The true probability can only be estimated through analytic probabilities or empirical probabilities. When we see a science article or news journals using probabilities, we should first try to figure out what probabilities are they claiming? Empirical or analytical? Then we can figure out how can we verify or reason with those probabilities.

Unless our claim is 100% or 0%, a sample size of 1 is not enough to make any meaningful conclusions about the probability. If I claim I have a coin well get 80% heads and 20% tails, I flip it once, and get a tail, can you conclude that I was lying for certain? No. But it may be tempting to conclude that there is an 80% chance that I was lying, which is another common mistake people make regarding conditional probability, which I will get to in a later episode. But regarding the probability of the win rate of each candidate, unless the two candidates have competed many times under identical circumstances, every election and conditions are unique, there is simply no meaningful way to compare win rates of the candidates or verify the claims regarding their chances of winning. To verify the empirical probability of someone’s claim, we need first to see that they have used a large number of samples. The larger the sample size is, the closer the empirical probability can be to the true probability.

How about analytical probabilities? Analytical probability calculation is based on the assumption that all events have an equal chance of occurrence. Breaking this assumption is often where the sampling bias comes from. If I want to compute the analytical probability of getting a red marble out of the bag, I assume first each marble has an equal chance of getting drawn out of the bag, that is 1 in 10. Then 6 of them are red, which means 6 in 10 chance I will get red out of the bag. But each marble has an equal chance of getting drawn out of the bag is actually an extremely strong assumption. It usually means that I know nothing about the marbles before drawing them out. If, for instance, the blue marbles are slightly bigger than the red marbles, and I have the tendency to draw bigger marbles than the small ones. Then the chance of me drawing would not be 60% as the analytical calculation would suggest. In the real world, we can often make the mistake to think that each event has an equal chance of occurring while they don’t. For instance, one of the common ways to collect political polls is using random phone calls of landlines. It may seem that using a computer, we can give each landline numbers an equal probability of been phoned, but in reality, due to that older population are far more likely to have a landline and answer the phone calls than the younger population, the polling disproportionally represents the older population than the young, and the analytical probability it computed is then, biased and do not equal to the true probability. Similarly, throughout the 20th century, most of the medical trials and tests were done while males averaged 40 years old. A lot of the medical standards such as healthy level of biosignatures, or doses need for treatment are based on the average value of that specific demographic. But women and other races do not always share the same average value among their health groups nor do they respond to drugs exactly the same way as a middle-aged white male. Even though modern medicine is trying to be more inclusive in our data collection, we are still living in the shadow of this bias and much past data is still in need of correction.


Now that we know how to calculate probabilities, how should we interpret probabilities? Let’s look at the following problem. Back to our bag of 10 marbles again, 6 red, 4 blue. If you choose one marble at random, what is the chance that the marble is to be red? We learned in middle school that the answer to the question is 60%. But, this question is what Buddha would call: a question wrongly asked. There are two different interpretations of the phrase “to be”. If the interpretation is: the chance of you taking a random marble from the bag, and that marble is red. Then the chance is 60%. But if the interpretation is the chance of a given marble is red, provided that it is pulled out of the bag? For that, it is either 100% or 0%. You may be confused and think that those two interpretations are the same or at least sound the same. Because a lot of the classes that teach probabilities have phrased things in both ways, actually, the 2nd interpretation is probably more common than the first. But I will show you why they are different, and why it is important to separate them in our daily language.

The first interpretation is the chance of you taking a random marble from the bag, and that marble is red. It is a statement about the population or the sampling space. Specifically the chance of getting a certain result when taking a random sample from that population. But the 2nd interpretation: the chance of the given marble is red provided that it is pulled out of the bag, is a statement about the specific marble that we have picked out of the bag: what is the color of the marble we have just pulled out the bag. The marble is either red or blue, it is not a mix of both until we observed the marble. As Schrodinger has pointed out using the cat in the box example, it is absurd to think a large-scale object to be in a contradicting state until we observed it. The marble is not 60% red 40% blue until we observed it. It is either red or blue. Now consider you have pulled a marble from a bag with 6 red marbles and 4 blue marbles, you say the marble has a 60% chance to be red 40% chance to be blue. Then, without looking at it, you put the marble into another bag, then you opened up the bag, you saw 2 red marbles and 2 blue marbles. Now the chance of the marble being red is 50%? Did something has changed about the marble when we moved it from one bag to the other? No, then why should the chance of what color the marble change?

Well, it shouldn’t, if the marble were red, it would still be red, if it were blue, it would still be blue. It is not that the marble has changed its color when we moved it from one bag to another, but our belief or confidence about it. So interpretation 1 is usually called the frequentist interpretation. Interpretation 2 comes from Bayesian interpretation, but with some important, and incorrect modifications. Bayesian interpretation is about confidence or belief with incomplete information. When I draw a marble from the bag, without looking at the marble, I am 60% confident that it is red, and 40% confident that it is blue. This interpretation describes our subjective beliefs based on the evidence at hand, not objective realities. Interpretation 2 ignored the subjective part of the interpretation, instead of talking about our confidence about the marble, it stated the chance as if it a raw fact about the marble. It is, completely wrong and can lead to disastrous conclusions, which I will illustrate later. Probability is a characteristic of the population, it tells us little to nothing about individual samples from a population. We can use it to form beliefs about the marbles, but not the conclusions about them.

Let’s look at another problem, consider a university of which 60% of registered students graduate in 4 years. You, a student newly accepted in the university, what is your chance of graduating in 4 years? This is exactly the same mathematical problem as the marbles in a bag, but our interpretation of the numbers can be completely different. In the marble problem, because we know that which bags the marble is in does not influence the color of the marble, it is absurd to interpret the probability as the marble has a 60% chance to turn red when we pull it out of the bag. But here, we know that the university we attend has an influence on our graduation, then can we say that we have a 60% chance to graduate in 4 years given that we are attending a university that produces a 60% graduation rate. But again, this conclusion has the same logical error as the marble problem. Say, that you are a hard-working student, then even though you are attending a school that only has a 60% graduation rate, you might still almost guaranteed to graduate in 4 years because you are hard-working. But if you are a hardcore slacker, you most likely gonna drop out within 4 years even though 60% of your class are gonna graduate. Yes, the university can have a huge impact on someone’s study habits and a good professor can go a long way in someone’s education, but the single number here does not tell us how the university impacts someone’s education or even a given attendant’s chance of graduating. The number 60% is a measurement of correlation, the ratio of the number of people graduating in 4 years from the given university and the number of people going to the university. It does not tell us the chance of a given individual graduating from the university nor does it tell us how the university influences the student’s chance of graduation. If we answer the question and say that your chance of graduating is 60%, we are kind of ignoring the factor that all students are different, coming from different socio-economical positions, also have different study habits. We have also mistaken correlation for causation. The 60% is a correlation, but when we say that your chance of graduating is 60%, we are kind of saying that attending the university will make you have a 60% chance of graduating, and how hard you work doesn’t really matter. If it does matter, and we do not know how hard you are working, then we can not know how likely you will graduate in 4 years, even though we know 60% of students graduate from the university in 4 years.

Wrong interpretations of probability can not only let us make the wrong conclusions about our chance of graduating, but it can also lead to unconscious biases. Some people may think I am so far just arguing over semantics when I was talking about the usage of the phrase “to be” and its different interpretations. But I will show you why it is not just semantics, but why it is important to interpret mathematical numbers as close to what they measure as possible. Consider the following question: We know that in the U.S., African Americans on average have less education and earn less than Whites. Then if we have two candidates for a job, one African American, one White, can we say that the African American candidate is likely to be less educated than the White candidate? This question probably will immediately make a lot of people’s faces twitch. But think about this, if we think it is ok to say that there is a 60% chance the marble we choose to be red, why can’t we conclude the African American candidate to be likely less educated than the White? It is applying the same interpretation for probability. A lot of people agree with logic when they look at the numbers because this is how they learned in middle school how to interpret probabilities. While this interpretation is, completely wrong, and can lead to biased conclusions, the person who uses this reasoning and holds those biased beliefs may think they are completely logical because this is how they learned to interpret probabilities in their math class.

Mathematics and the raw number have the tendency to feel objective, because we have learned in school that 1 + 1 = 2 is always true and is not up for debate and interpretations, except that it does. Whenever we translate a mathematical expression into English or use mathematics to describe the physical world, we actually interpret math and give it real-world meanings. And many of those interpretations can not only be wrong, but extremely misguiding. If you are interested, in my previous episodes on scientific methods, I had a deeper discussion about how can we verify if our interpretation of a mathematical model is correct using the scientific method and linguistic agreement test. What language we use to express those mathematical concepts can not only shape what conclusion we draw but can also introduce bias into how we view the world.

In this episode, I want to emphasize the meaning of probabilities. Probability is a measurement of some characteristics of a population. For instance, in the marble example, the probability of 60% is the chance of getting a red marble from the bag. It is a measurement of an outcome by performing an operation on the population. It tells us little if nothing about a specific individual of the population. We should avoid drawing conclusions about individuals in a group based on a probability we know of the group. The 60% chance is the chance of drawing a red marble from the bag. It does not tell us what is the color of a given marble from the bag is. Similarly, if we take a random minority from the population, they are likely to have less education and less income than a random white sampled from the population. But it does not tell us the income or education level of any specific minority. Furthermore, even though certain characteristics are correlated with a population, probabilities alone do not tell us why they are correlated, and we should not make conclusions about causality from correlation. There can be deep socio-political reasons why wealth and education are distributed the way it is now and data from present time can’t tell us how we got here. We will have to look to the past for answers to those questions.

On a side note, I am not very comfortable with use of the word “causality”. The concept of causality often implies that in a chain of events, if one event in the chain has not happened, we would not be in our current situation. Causality is often talked about in the context of assigning blames. If someone gets into a car accident, we would attribute the cause to his driving or disobedience of the law instead of the invention of the automobiles, even though without cars, the car accident would not have happened. But in the real world, many events are influenced by many factors, some are human choices and some are just random chances. For instance, someone’s quality of education can be influenced by what family they were born in, what teachers they encounter throughout their life, and their own personal efforts. A person who is born in a poor country can work extremely hard but may never reach the same level of education as someone who is from a developed country but pays little attention to school. We like to think all events in life, good or bad are results of personal choices, but that simplistic model does not reflect how the world works. In science, I prefer to think in terms of influence rather than causes. Instead of saying that “science has found that A causes B.”, it probably better to phrase it as “science has found that A has an influence on B.” Yes, it would not be as headline worthy, but it avoids the inherent blame assigning subjective nature of the word “cause”, and acknowledges the complexity of the interwoven parts of the world.

In the next episode, I will dive into conditional probability, and p-value, the bread and butter of modern scientists, and why as Mark Twain as said, there are three types lies, lies, damned lies and statistics.

Leave a comment