Scientific Method: 3. Falsifiability

Mathematics often gets a bad reputation among students. It is often perceived as abstract, confusing, useless, and completely made up. Those perceptions are not all true. Math does not just deal with obscured numbers and equations. It is a language for describing relationships between, well, everything. The fact is, every sentence we speak or think of, is a description of the relationship between different entities. Take the following sentence as an example: “I ate a peanut butter sandwich for lunch yesterday.” The sentence, at first glance, may seem to be a trivial description of an unremarkable event. But it is in fact, packed with information. It describes the relationship between the sandwich and me, that is, I have eaten it. It contained the relationship between the event and the time at which it has occurred: “at lunch”, and “yesterday”. It also informed us of the relationship between the sandwich and its constitution: it is made of peanut butter. If, I said: “I eat a peanut butter sandwich for lunch yesterday.” or “I ate a peanut butter sandwiches for lunch yesterday.”, a native speaker would immediately recognize my grammar errors, specifically, the present tense verb does not agree with the past time, and the plural noun does not agree with the singular article. Even though we don’t usually think about it when we speak, our daily language does have a set of rules that the composition of words should follow, to eliminate ambiguity when we communicate, we know those rules as grammar. If I were to break the rules, say, “I ate a peanut butter sandwiches for lunch yesterday.” The communication became unclear, as the listener can not tell how many sandwiches I have consumed simply from the sentence alone. Good communication requires less contradictions in the language we use. Grammar facilitates this need to a certain extent. But for anyone who has ever tried to communicate before, what people think we meant, often are not what we want to express. In the last episode, I have illustrated how the word “freedom” can mean different things to different people and ultimately leads to opposite conclusions regarding how beneficial the emancipation of slaves was to freedom. This ambiguity and flexibility of the natural language allow us to compose sonnet and plays but also impose challenges in many cases to make communications transparent and clear.

Mathematics, similarly, also is a way to describe the relationship between entities. 2 + 3 = 5 is the equal relationship between putting 2 and 3 together, and a 5. It is somewhat similar to say “if you put 2 apples in a basket, then another 3 in the same basket, there will be 5 apples in this basket.” What is different between the language of mathematics and the language of English, is the grammar of math is far more restrictive, and the language of mathematics is designed to be free of contradictions. Any two statements made in the same mathematical language, can not contradict each other. It is not possible to get 2 + 3 = 5 and 2 + 3 = 4 at the same time if we follow the mathematical language correctly. But this is not the case for natural language, as the example of “freedom” has demonstrated, it is possible to draw conclusions such as “the emancipation of slavery is good for freedom, but is also bad for freedom.”

A linguistic system that is designed to be free of logical contradictions is called a formal system, or a formal language. In contrast, a linguistic system that contains logical contradictions is called an informal system, or a natural language. An obvious example of natural language would be English. Mathematics is the study of all formal languages or systems. But formal systems are not just numbers or equations. For example, video games are formal systems, at least those without many bugs. They play by a set of very specific rules, all players are forced by the program to behave according to those rules. When those rules are enforced, the game will not get into a logically inconsistent state, for instance, two players would not occupy the same virtual space, and the game would usually end up with only one winner. But video games are not the only formal systems we commonly interact with. Humans have been designing formal systems for millennia. Board games such as chess and go are formal systems, so are card games such as poker and blackjack. They all have a fixed set of rules, that if all players choose to follow, the state of the game will be consistent. In a chess game, at the end of each turn, no two live pieces can occupy the same space. In poker, if someone ended up with more cards than others or 5 Kings has shown up, we know something fishy is going on. By agreeing on the rules of the play, something very remarkable has happened: unlike our daily arguments in which people often seem to have a hard time to agree on the conclusion even when presented the same evidence, when shown a chessboard or two poker hands, people agree on who is the winner and who is the loser. This power of agreement enabled by formal systems, let more and more scientists favor mathematics as the language for scientific theories over natural languages. Because an objective standard should one that everyone agrees on, and correct usage of mathematics can make that happen.

But unfortunately, simply because we agree on the rules, does not always mean the rules constitute a contradiction free formal system. We can make up a simple game, two players would draw a number between 1 to 10, if the player draws a 2 or a 4, she would win the game. You might immediately realize two problems. First, what happens if one player draws a 2 and the other draws a 4, who wins the game? Second, what if neither of them draws those two numbers? You see, not all sets of rules using numbers lead to unambiguous answers. A lot of scientists have a set of rules they play by to draw conclusions, such as the statistical significance tests, but they don’t always lead to unambiguous answer about the validity of their theory. That is, sometimes the rules may say the theory is right, and other times, the rules would say that it is wrong. But I will get to statistics later.

I will start with the set of rules mathematicians play by, logic. Before I define logic, I want to make two concepts clear, premises, and conclusions. In an argument, premises are statements that we hold true without providing any justification. We take the truthhood of the premises as granted, and they are the entry point of an argument. If every statement requires justification from other statements, we unavoidably would end up with either an infinite regress or cyclic reasoning. What’s wrong with infinite regress? Consider the “why” game kids often like to play. “Why does the Sun rises every morning?”, “Well, it is because the Earth rotates on its axis.”, “Why does the Earth rotates on its axis?”, “It is probably because when Earth was formed, the dust, rocks, and gases orbiting around the cores transform the angular moments to Earth.”, “Why does the angular momentum can be transferred from one object to another?” “…” You see, we can keep asking the question why, but at some point, we will find ourselves stuck at the end of the chain that we can not answer, even God can not save us. We can say “That’s how God has created the world.” But then how do we answer the question “Why did God create the world the way it is?”. The problem with infinite regress is that if we keep asking why, for all questions, we will be stuck at some point, therefore no statement can be sufficiently justified without assuming something to be true in the first place.

What about cyclic reasoning? Take the following example: “You must obey the law because it’s illegal to break the law.” The premise is it is illegal or wrong to break the law, the conclusion is therefore it is right to obey the law. Saying it is wrong to break the law is equivalent to say it is right to obey the law. That is the premise and the conclusion is essentially the same in this argument. The type of arguments that conclusions are included in the premises is called “petitio principii” or it’s unfortunate English translation: “begging the question”. It is probably better translated to “assuming the question”, but the translation has already stuck. “begging the question” is the simplest but most common form of cyclic reasoning. You may have noticed that in the argument that “You must obey the law because it’s illegal to break the law.” The argument says nothing about what the law is, or should be. The law may say not to steal or it might be forcing people into a cage fighting to the death in a battle royale as in the Hunger Games. Some may say that the law only enforces good things, therefore we should obey it, which is again also begging the question, by assuming the law is good then conclude obeying it is good. The issue with cyclic reasoning is that we can basically use it to prove anything, all we need to do is just to assume what we want to prove to be right in the first place.

Logic is one type of reasoning processes that also carries an argument from premises to conclusions. I want to make sure I am clear here. I am using the word “logic” very narrowly. It means exclusively mathematical logic or deductive reasoning, by which true premises always lead to true conclusions. There are other types of reasoning as well, using which true premises sometimes can lead to false conclusions. Those are fallacies. Cyclic reasoning and infinite regress are two of them.

Consider the following argument: “All humans are mortal, I am a human, therefore I am mortal.” The two premises here are “All humans are mortal” and “I am a human”. The conclusion is “I am mortal”. It is called deductive reasoning is because it uses assumptions about the general cases: the mortality of all human beings to deduce a specific case: “am I mortal?” Deductive reasoning is the foundation of mathematics. In mathematics, we start with definitions that are statements about the general we hold true, from there we draw conclusions for special cases of the general. For instance, we can start with the sum of inner angles of all triangles is 180 degrees, then for right triangles with one 90 degrees angle, the sum of the other 2 must be 90 degrees. We have just deduced the property of a set of special triangles, right triangles, based on our assumptions of all triangles.

Instead of going from the general to the specific, we can go from the specific to the general in mathematics, using what is called “reductio ad absurdum” or, proof by contradiction. It goes as follows, I started by saying “all mammals have tails”, you want to show me that I was wrong, so you pointed out that I do not have a tail, therefore “all mammals have tails” can not be true. When we try to make broad statements about the general, we can show that the statement to be false if we can provide a counter-example, or show a contradiction. All mathematical proofs are based either on deductive reasoning or proof by contradiction without exception. If you pay attention, you will realize that all of our daily sentences are either about a broad statement of a group or some specific cases. In mathematics, we call the statements about all members of a group universal quantification, the statement can be “all humans are mortal”, “all mammals have tails”, or in the real world, “all politicians are corrupt”, or “all college graduates are smart”. You may immediately realized that making broad generalizations about human groups simply doesn’t work in most cases. It is because of how those groups are formed. In Mathematics, all triangle as three edges is because we call shapes triangles if they have three straight edges that connect to each other. But there isn’t a social club that is exclusive to corrupted people and calls themselves “the politicians”. The group of a triangle is created based on a uniquely shared set of properties: the number of edges, how the edges connect, etc. The groups in mathematics are defined by definitions. But human groups are formed more organically, there is usually a shared trait that most of the people in the group have, for example, most politicians are involved with the government in some capacity, but there might be some politicians who just call themselves politicians because they want to break into that inner circle. The groups in our daily usages are generally defined by examples. This blurriness of the human groups has inspired many sciences that study human behaviors such as sociology, phycology, economy, biology to favor statistics for analyzing humans.

Statistics is a can of worms that deserves its own episode later, but I will say this here. When using it correctly, statistics can provide incredible insights into human nature. But when used wrongly, it can be just as problematic as all the other fallacies. We often think mathematics and numbers are objective, therefore if we use them and do the calculation correctly, our conclusion must be true. That is, to put it bluntly, dead wrong. Here is an example, I claim that if we put 2 ducks into 2 liters of water, we will get 4 duck water because 2 plus 2 equals 4. It is obviously absurd because we can’t add 2 ducks to 2 liters of waters. But if we add 2 liters of water to 2 liters of propanol, due to the space between molecules, you will get 3.94 liters of the mixture, not 4. That is, algebraic addition is not a good model for this phenomenon. If we have a bad model, we can do all the math right, our conclusion is still wrong. Btw, this is a “reductio ad absurdum” or, proof by contradiction argument for why not all arguments use numbers are correct.

In contrast to the universal qualifier, which qualifies properties shared by a group, the existence qualifier conveys properties of specific cases. For instance, the claim that all swans are white is a universal qualifier about every swan. But the statements: “there exist swans that are not white” and “some swans are not white” are existence qualifiers that make claims about some specific cases that at least one swan is not white in this universe.

English speakers know that “Not all swans are white” is not the same as “All swans are not white”. The first statement is an existence qualifier, that claims some swans are not white, but the latter statement is a universal qualifier, that claims all swans share the same property of not been of the color white.

The negation of a universal qualifier is an existence qualifier and vice versa. For example, what is the opposite of “All swans are white”? It is not “All swans are not white”, but “Not all swans are white”. I should define what “opposition” and “negation” in logic are first. Two statements A and B are opposite to each other if they can not be both true at the same time, but also, one of them must be true, that is, they can not be both false at the same time either. If you care about the mathematical jargon, two statements that can not be true at the same time are called mutually exclusive while two statements that can not be false at the same time are called complementary. So two statements are opposite if they are both mutually exclusive and complementary. “All swans are white” and “All swans are not white” are mutually exclusive, because they can not both be true, but they are not complementary, because if some swans are white and some swans are not, both statements are false. The opposite or negation of “All swans are white” is “Not all swans are white” or “Some swans are not white”. The negation goes both ways, so The opposite or negation of “Some swans are not white” or “Not all swans are white” is “All swans are white”. You may have noticed that the negation of a universal qualifier is an existence qualifier and the negation of an existence qualifier is a universal qualifier.

The exclusivity and complementary of the universal qualifier and existence qualifier lie in the heart of the falsifiability of the scientific method. Consider this theory mentioned above: “All swans are white”. How can we know if it false? Well, if we found a swan that is not white, that is, there exists a swan that is not white, we know the theory must be false. But how can we know if it is true? This is tricky as we will have to examine all possible swans, from the past, present, and future to make sure every single one of them is white. It is impossible. That is, we can’t prove the theory with a universal qualifier to be true, only to be false by finding a counter-example.

Now, how about this theory: “Some swans are white”. Note this theory states an existence qualifier instead of a universal qualifier. Well, if we found a swan that is white, we have shown the theory to be correct. But can we show that it is wrong? Same as proving that all swans are white requires the examine all swans in the universe of space-time, to prove that no swans are white, which is the negation of some swans are white, would also require us to examine all of the swans to make sure that every one of them is not white. So to prove the falsity of an existence qualifier, just as proving the truthhood of a universal qualifier, is an impossible feat.

So what statements or theories can be considered scientific? In the first episode, I said that what was revolutionary about Newton’s theory of motion, is that it is a unified framework that had been demonstrated at the time all things in the universe must follow. From the beginning of history, humans like to see patterns between events. We read stars to predict future events, cast bones for our fortune, but most of those predictions are more often misses than hits. Newton’s theory of motion shows a comforting certainty that we desperately want, using his equations, we can calculate the future returns of Halley’s comet just as certain as we know the Sun will rise tomorrow. This kind of certainty allows us not only to understand the events of the past but also to produce accurate predictions of the future, which is the key to the reproducibility principle laid out in the last episode. The only way for a scientific theory to provide accurate, reproducible results in the future, is that it is stated using a universal qualifier.

If I theorized that “all swans are white”, if you found a swan, according to my theory, and if the theory is true, I can tell you that it must be white without seeing it myself. This swan theory tells a definitive relationship between the specie of an animal and the color of the animal. Similarly, in Newtonian mechanics, if we know the velocity of a cannonball and how far it is from the ground, I can predict to astonishing accuracy, all of its future positions before it hits the ground, given that nothing else has disturbed it.

What if I theorized that “some swans are white”? If you found a swan, according to my theory, and if the theory is true, what would be the color of the swan? Well, we don’t know. It can be white, black, or even purple for all we know, that is this theory is almost useless in predicting the future. Freudian analysis or other types of data interpretation may give us some insights into what has happened in a specific event, but if they can not be used for future predictions, they are not scientific.

Science tries to find generalized relationships between observations, whether it is the current position and velocity of a given rock to its future position and velocity, or the patients’ symptoms and the drugs they should take. A single event tells us little about what the relationship is. During the black death, it was rumored that Queen Elizabeth has received a ring that has purportedly protected her from the black death. Even though it may seem absurd to us a magical ring can protect against bacterial infections, but Queen Elizabeth has never got the plague, as absurd as the argument may sound to us, it is simply impossible to show that a connection between two an events to be false, based on a single occurrence. But it is not just the argument about the magic ring and its healing ability that are problematic, we actually engage this type of reasoning quite often. Say that I have found a new diet online, I have tried it for two weeks, and afterward, I was two pounds lighter. Note that I did not say that the diet has made me two pounds lighter. Based on the observability principle, I have not observed how the diet has affected my weight, only that I have taken a new diet and two weeks later, I weight two pounds less. You may have heard of the phrase correlation is not causation before. Here the new diet and weight loss were only two correlated observations, we have not yet concluded that one has caused the others. But can we conclude that the diet has caused my weight loss? Well, if we do, using the same logic we will have to also accept that the ring has protected Queen Elizabeth from the plague because we have exactly the same amount of information in the two cases.

So, OK, one occurrence is not enough to prove causation, how about 100 cases? If I post on a forum and got 100 people who have also used the diet and observed an associated weight loss, can we conclude that the diet caused weight loss? The evidence certainly feels more convincing. To some people, it may even enough to conclude that the diet is working, and they are willing to try it out themselves. As mentioned previously, the logical process from general to specific is called deductive reasoning. If all humans are mortal, then this human is also mortal. In contrast, the process from the specific to the general is called inductive reasoning, inference, or proof by experience, if we find one human to be a mortal, then another human to be mortal, then another one to be mortal, with enough examples, we then conclude that all humans are mortal. Inductive reasoning may sound appealing to many people or even feels intuitive because it is how we learn new concepts for most of the time. When we were toddlers, we don’t learn what the color red was by being told that red is the light with a wavelength of 620 to 750 nm. No, we were shown pictures of red apples, red car, red cubes, we learn by finding the commonality between all of them, that specific color shared by all of the pictures. Even mathematics started this way, most of us would never learn the formal definition of addition or multiplication, but whoever has gotten through elementary school learns how to do addition and multiplication just by given many many examples and doing many many exercises.

Inductive reasoning is intuitive, but there is just one problem with it, it doesn’t always lead to true conclusions. In the last episode, I have discussed the constant measure of the speed of light and the discovery of general relativity. I will use this example again. Based on our experience, if I am running on a moving train, for someone off the train to measure my speed, she would get the arithmetic sum of the speed of the moving train and my running speed. If we run the experiments with different train speeds and different running speeds, we can see that the measurement is always the arithmetic sum of the speed of the moving train and the running speed of the person. And that was what people have observed for centuries. By inductive reasoning, we can conclude that the accumulations of the speeds of all frames of reference are arithmetic sums. But there is just one problem, the measurement of the speed of light is always constant. You can’t shoot a light off a moving train and get a faster light than if you shoot it from the ground. Inductive reasoning has lead us to a wrong conclusion. But it is not just the measurement of the speed of light induction can get wrong.

Machine learning and AI have become hot topics in the recent public consciousness. Most of the machine learning algorithms are designed to use inductive reasoning to find connections between variables. For instance, you can provide a collection of pictures of scientists and train the machine to learn what kind of people looks like scientists. The computer will try to find common features among those pictures and consider those as the defining features for scientists. But if we give it examples of scientists that are mostly male, the algorithm would recognize a strong correlation between the male gender and being a scientist and therefore conclude that males are more likely to be scientists. It is not very much different from human reasoning, in the turn of the 20th century, when female doctors were rare, most people have only seen male doctors would easily conclude by induction that only males are capable to be doctors and doctors is not a profession for females. In one research conducted by a team at the University of Washinton, they seek to find a machine-learning algorithm that can tell apart huskies from wolves. To their surprise, the algorithm was able to tell apart husky and wolf about 90% of the time. But later the researchers have realized that instead of learning the subtle feature differences between husky and wolf, what the algorithm has learned was that the images that were labeled as wolves were mostly with snowy backgrounds while the images that were labeled as huskies were mostly with grassy backgrounds. So by induction, the algorithm reasoned that if the background has snow, the animal must be a wolf, and if the background has grass, the animal must be a husky. When the researchers digitally altered the images and put a husky onto a snowy background, the algorithm identified the husky to be a wolf.

It is not just wolves and huskies that an inductively-reasoned machine can get wrong. In the United States, some states, such as New York, Wisconsin, California, and Florida use a risk assessment software called COMPAS in court to access the risk of a convicted criminal to determine if the criminal would like to commit another offense in the next two years. Based on the risk assessment provided by COMPAS, the judge would recommend more severe sentences to higher risk convicts. In 2016, ProPublica has launched a series investigation of the algorithm and has found that blacks are almost twice as likely as whites to be labeled as higher risks by COMPAS. Many people based on their person inductive reasoning may agree with the conclusion drawn by the software. But it is not the end of the story, ProPublica has also followed up with people been accessed by COMPAS in the following years, they have found that the whites who were labeled as lower-risk are much more likely than blacks who were labeled lower-risk to go on to commit other crimes, furthermore, only 20 percent of people predicted to commit violent crimes actually went on to do so.

In the age of big data, it is certain spooky that how much the large internet companies know about us. They can recommend advertisements, news, or shows that are tailored specifically towards our engagement. But I think it is even scarier when we start to blindly trust the prediction of black-box AIs without understanding exactly the interworkings of how those predictions were made. When we start to use AI for medical diagnostics or criminal trials without understanding why those recommendations were made or even if they were correct, those mistakes can do the most harm.

Inductive reasoning makes a lot of intuitive sense, and it can be a useful shortcut for our everyday decisions. If I like most of the Disney movies in the past, I can inductively reason that I am gonna like the new one that is coming out, and make plans to catch it in the theatre, but, of course, it in no ways guarantees that I would like the new movies. From a pragmatic point of view, we have only limited information in life, and it is practical to use our past experience and inductive reasoning to make decisions. But in no way inductive reasoning can be used as proof for the validity of our beliefs. As we have seen above, inductive reasoning can lead to prejudiced conclusions that can be extremely harmful to those who have been judged wrongly or lead to misguided policies from racial segregations to eugenics that can cause large scale sufferings.

Science concerns ultimately with what is true, not what is practical. Therefore we can not rely on inductive reasoning for scientific inquiry, only deductive reasoning. In addition, we want to use the scientific method to find universal laws that we can use to predict the future. But this left us with a dilemma because we can never prove a statement in the real world with a universal qualifier to be true, only show that it is false by contradiction. Indeed, modern scientific discoveries are called theories instead of laws, such as the theory of evolution, or the theory of relativity, because the scientists in the early 20th century have realized that what we have considered as truth or universal law, from Newtonian laws of motion to the law of supply and demand proposed by Adam Smith turned out to be not so universal after all. We now use the word theory for scientific discoveries not because they are merely a hunch we had, but that we understand even though those theories can pass the most diligent tests we can perform the moment, we can not be sure that in the far future, with the aid of more accurate instrument and experiment setup, we would not be able to find counterexamples that demonstrate that our theories may turn out to be false after all.

This is at the core of the Falsifiability principle. We can never be certain that our theory is right. The scientific method requires systematic doubting as proposed by Rene Descartes. Our best chance to show that a belief to be true is to turn over every rock do everything in our power to try to find evidence and examples to show that our belief is false. Only when we have tried our hardest to question and doubt what we believe and have failed, we may have faith, that our belief might be true.


In recent years, humans have started to accept the uncertainty of the universe, and instead of seeking scientific theories of certainty, we have shown a preference to use statistical analysis to form theories of uncertainty about the working world. There is a certain appeal to it. There are so many unknowns in the world and so far mostly only in a very ideal laboratory setting, it is possible for us to gather enough information to make accurate predictions. With statistics and probability, we can still make educated and informed decisions with limited information. But statistics can be misguiding as well, I will discuss the fallacy of p-value later, but here is one important note. For a universal qualifier, when an impossible thing happens, we know our premises is wrong. When I claim all swans are white, and you swing a black swan in my face, it is sufficient evidence to show that I was wrong. But if I claimed that most of the swans are white, you can’t prove that I was wrong by showing me a black swan or using it as a piece of evidence to say that my claim is unlikely to be true, even though it may feel attempting to do so. Because unlikely events happen all the time. People win the lottery or be stroked by lightning, but they are not evidence to conclude that winning the lottery or being struck by lightning are common events. The falsifiability principle of the statistic tests are different from theories of certainty, I will pose it as a challenge for the listeners to think about how to use the consistent measurement of the observability principle, to falsify a statistical hypothesis.

Leave a comment