Featured

Scientific Method: 1. Observability

Started from middle school, I was required to learn Physics, Chemistry, and Biology. In college, I have to study social science and political science to fulfill my general education credits. In retrospect, I have been studying science for almost all my life. But only until about two years ago, I started asking questions about the nature of science. I realized that I did not know what the formal definition of the scientific method was, and I had no idea why they would tell me the truth about the world around me. I know that physics and chemistry are science, while history and literature are art. But those are examples of science. What makes physics science, while history not? Is the difference in the usage of experiments? Physicists conduct experiments, but so do many modern historians. They use carbon dating to determines the date of historical documents. Furthermore, alchemists are also heavily dependent on experimentation. But we usually do not consider alchemy as science. Throughout the series, I will talk about different facets that define science and provided arguments regarding why we think that it works. Whether you are reading or listening, I hope it can help you understand how scientific discoveries were made, and to help differentiate between what is science and what is pseudoscience.

The word science means knowledge in Greek. The scientific method is a systematic way of making knowledge regarding the natural world. But the skill of knowledge-making is not unique to humans. For instance, chimpanzees pass down the knowledge of termite fishing across generations. Crows can understand water displacement principles and use stones to extract water from deep bottles. But there is no doubt, humans have developed the most sophisticated methods for knowledge-making that far beyond any other known life forms. From the language we use, such as formal names for the elements, chemical compounds, and even mathematics, to measuring instruments, some as simple as rulers and clocks, others are extraordinarily complex, like the Hubble Space Telescope orbiting the earth, we have come a long way since the dawn of humanity, and we have learned wonders about the nature of knowledge itself.

Before physics, there is natural philosophy, before astronomy, there is astrology, before chemistry, there is alchemy. We often think what differentiates science from pseudoscience is the subjects of inquiry. But that is actually not the case. Natural philosophy and physics both study the everyday phenomenons, astrology and astronomy both study the celestial bodies, alchemy and chemistry both study the properties of and interactions between materials and matters. The true difference between science and pseudoscience, (at least as Karl Popper has defined), lies within the methodology. The scientific method I referred to, throughout the series is the method that is used in modern physics. In other fields, such as medicine, psychology, and sociology, the methodologies adopted are not exactly the same. I will do my best to present arguments and evidence, as a scientist myself, to demonstrate why we believe that the scientific method is the only way known to human that leads to the objective truth of nature.

Isaac Newton was often referred to as the father of modern science. In his book Mathematical Principles of Natural Philosophy, Newton has not only introduced calculus to this world (the subject that many college students dreaded) but also proposed a unified theory of motion that demonstrated the existence of a universal law that not only governs the celestial bodies also everything else we interact with, in our daily life. The most revolutionary part of his theory, besides the derivation of how the planets revolve around the sun, is the idea of universal laws. There exist immutable laws that can describe everything in this world, and nothing is exempt from them. With those laws, we can not only predict the future but even control and shape it to our own end.

But what is not well know is that besides his contribution of founding physics, Isaac Newton was also an active alchemist. According to his writings, he has actively experimented and sought the philosopher’s stone that he believed to be able to convert metals to gold and also the essential ingredient in creating the Elixir of Life. Newton devoted much of his life in search of the fabled stone.

The scientific method has been evolving throughout history, and the methods we use in different fields will keep evolving with our understanding of the subjects and knowledge itself. Even the greatest minds in history such as Newton can and have fallen into believing pseudosciences such as alchemy and astrology. It is dangerous to presume that any of us somehow would be exempt from those mistakes. Therefore I think it is important, whether as a student, a researcher, or just an enthusiast who occasionally reads scientific news, to think beyond the truth as simply what we have been taught and told, to not only understand what the scientific method is, but also contemplate why we think it leads to objective truth, or even, if it will lead to the truth at all? And finally to be able come to conclusions in our own terms based on solid logic and evidence. I am glad that you are here to take the journey with me.


The first facet of the scientific method I will discuss is observability. Let’s start with a thought experiment:

Consider that a coin was given to us. We flip it again and again, and every time we perform this test, the coin always gives about 70% heads and 30% tails.

Given this observation, we form two different conclusions.

Conclusion one: there is a manufacturing defect on the coin that is influencing the chances of the heads and tails.

Or conclusion two: The coin is fair. But there is a supernatural force that is interfering with our experiments.

From our modern perspective, it is easy to argue that conclusion one is more scientific than conclusion two because we think manufacturing defects are simply more likely than supernatural forces. But actually, neither of those conclusions drawn is scientific, because neither conclusion was directly observed in the experiments. The only thing we have witnessed and measured through this coin toss test is that the coin produces 70% heads and 30% tails, repeatedly. And nothing more. We observed neither the manufacturing quality of the coin, nor any supernatural forces.

Science is empirical. We can only make statements about what we have observed. As for things that we have not measured, it is not in the job of science to fill in the blanks with theories to guess why it has happened. The observability of the scientific method means that, in science, we can and should only make claims about what we observe and measure. Interpret or infer from the data is not part of the scientific method. The question of what can be observed is the very foundation of the uncertainty principle in quantum mechanics first described by Heisenberg.

So imagine there is a basketball lying on the field. When we observe the basketball, what we actually see, is the light that bounces off the ball. We know that photons, the light particles, carry momentum, therefore, when they bouncing off the basketball, they also exert forces on the ball. But because the basketball is so massive in comparison to photons, the effects of the forces exerted on the basketball by the light is not enough to move the ball. But if we shrink the basketball, smaller and smaller until we can no longer ignore the forces exerted by the photon. Then when we bounce the photon off the basketball, we can see where the ball was at the moment of the impact, but we have also inevitably moved the ball by hitting it with the photon. This is the core idea of the uncertainty principle.

For every physical object, especially quantum particles, to measure them, we have to interact with them using other objects. In the example above, that object of measurement is a photon. This interaction will inevitably change the current state of the object that we are trying to measure. To put it in the context of measuring electrons, for example, to acquire a more accurate reading of the position of the electron, we will need to use photons with higher energy levels, in turn, it will change more of the electron’s velocity during the impact. Therefore, there is a trade-off between the certainty of measuring the electron’s position and velocity. The more accurate measurement of the position will lead to more uncertainty in the velocity, and vice versa. Hence, the uncertainty principle.

You may have heard of Schrödinger’s cat before. It is probably the most famous allegory of the uncertainty principle that is featured in many shows from the Big Bangs theory, to numerous Youtube videos. Schrödinger’s cat aims to show an analog example to the double-slit experiment. A simplified version of the double-slit experiment can be described as follows: consider a light source that is capable of emitting a single photon at a time. In front of the light sources, we place a dark sheet with two slits on it, 1 and 2. The distance between the two slits is relatively small. Behind the sheet, we have a clear screen that can show us where the photon lands. If we block one slit, we can see that the photon will hit the screen at a position that is behind the other open slit. But, if both slits are open, the pattern on the screen shown an interference pattern as if the single-photon has passed both slits at the same time. This interference pattern consists of bright and dark strips that is completely different from the simply adding the results from the two single slit tests. Because the observed pattern on the screen seems to suggest that the photon has passed through both slits at the same time. The Copenhagen interpretation stated that we can, therefore, consider the photon to be in a superposition of passing through slit 1 and slit 2 at the same moment. Schrödinger’s cat is a thought experiment demonstrating this interpretation on a larger scale. Now if we consider a cat closed in a box, there is a poison releasing system, that if the photon passes through slit 1, it will release the poison and kill the cat, and if the photon passes through slit 2, it will not. Before we observed which slit the photon has passed through, the cat is then analogically can be considered as in a superposition of dead and alive at the same time.

Even though Schrödinger’s cat has become a paradigm example for pop science to explain how quantum mechanics works to the general public. It was originally proposed as a criticism of the superposition interpretation. To quote Schrödinger: “One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along with the following device, … the entire system would express this by having in it the living and dead cat (pardon the expression) mixed or smeared out in equal parts.” Schrödinger was critical about the existence of the living-dead cat. And Einstein agreed.

Einstein later wrote to Schrödinger: “You are the only contemporary physicist, …, who sees that one cannot get around the assumption of reality if only one is honest. Most of them simply do not see what sort of risky game they are playing with reality—reality as something independent of what is experimentally established… Nobody really doubts that the presence or absence of the cat is something independent of the act of observation.”

Indeed, simply because we have not seen the cat in the box, does not mean the cat is both dead and alive at the same time, and similarly, simply because we have not seen the photon crossing the slits, does not mean it pass through both slits at the same time. Reality does not depend on our observation. It is risky to make guesses about what we have not seen because if our guess was wrong, how can we know?

Then if the photon did not just pass through slit 1, or just slit 2, or both at the same time, how did the photon get to the screen? The simplest answer is, we do not know, because we can not see it. As Einstein has pointed out, it is a dangerous game to make statements about unobserved reality or to interpret data and extrapolate information that is beyond what we have measured. When we make a prediction and it contradicts our observation, we know our prediction is wrong. But when we interpret data and fill in the blanks about the cause of what we have seen. How can we know our interpretation is correct? Yes, we can have many different guesses, some might make more sense to us than others. For example, manufacturing defects might make more sense to us than supernatural forces. But by the end of the day, what makes sense to us is subjective to each person’s experience and beliefs. There is no objective way to tell which guess is better than the others. The results of the unfair coin toss maybe because of the defect of the coin, it may be because of a magnet hiding somewhere we don’t see, or it could be some other reasons that we don’t know about. We can say which explanation seems more likely to us, but we don’t have enough information to decide which one is true. Therefore, we need to differentiate what we have observed from our guesses about what has happened. When doing science, for the sake of objectivity, we should only deal with observations. We do not see the light transmitting through the air, or which slit it passes through, only where the photon has measured on the screen. Therefore, we can not, and should not make any claim about how it passes through the two slits. It is not in the business of science to make speculations about what might have happened or what would have happened. For this reason, medical diagnostics are generally not considered as an exact science. Doctors measure blood pressures, order lab works, MRI scans, and then use those measurements to theorized the underlying cause of the patients symptoms, while the cause itself is often not directly observable or measurable. But those diagnoses do usually rely on scientific discoveries that are based on direct observation.

Take Diabetes diagnoses as an example. Type 1 Diabetes diagnoses come with the conclusion that the patient’s pancreas produces little or no insulin. But the doctor usually only measure the blood sugar level and blood insulin level after fasting. We don’t directly observe how insulin was produced by the pancreas. But given that the pancreas is the only organ we know that produces insulin, it is natural to conclude that the pancreas is failing when the blood insulin level is low. Of course, the low insulin level could also because it is consumed by the body to process sugar. That’s why fasting is required before insulin tests. How do we know the pancreas produces insulin in the first place? In 1889, two physiologists Oskar Minkowski and Josef von Mering removed the pancreas from a couple of lab dogs. Later, they have observed symptoms of diabetes on those dogs. Therefore they suggested that the pancreas plays a crucial role in regulating blood sugar, which ultimately leads to the discovery of insulin. In 1916, Nicolae Paulescu developed an aqueous pancreatic extract that has successfully been used to normalize the blood sugar level of diabetic dogs. Unfortunately, this pancreatic extract was not pure enough and can cause severe allergic reactions when used on humans. In 1922, 5 researchers from the University of Toronto, Banting, Best, Collip, Macleod, and FitzGerald, created the first purified pancreatic extract, also known as insulin, sold the patent for one dollar symbolically, so that it can only be used to benefit the patients, not any companies that might hold the monopoly over this life-saving drug. But unfortunately, since the American companies that obtained the patent for insulin, the three companies were able to extend its patent and hold an oligopoly over the insulin market in the U.S. and the price of insulin in the U.S. is about 10 times the price in Canada, where it was originally discovered almost a hundred years ago.

Sometimes the experiments produce results that agree with our theory, and sometimes they do not. When we conduct an experiment and do not get the result we expect based on our hypothesis, it is easy to speculate that something has gone wrong with the experimental setup. Modern experiments, from chemical reactions, to medical tests, often require complex procedures, they can, and are often subject to mistakes. When we get a result that contradicts our hypothesis, it could be due to a mistake when conducting our experiment or, it could also just that the hypothesis is wrong. As we can not make any claims about whether the cat is alive or dead before we opened the box, we can not say if the mismatch between the result and our prediction is due to the experimental error or just a false hypothesis. An honest scientist would report all of the experimental data, both positive and negative, and make no attempts in trying to explain away the negative results.

I have talked about observation and measurement for a while now. But what constitutes a valid observation or measurement?

Anyone who has ever used a ruler did measurement before. There are two types of measurement, numerical and categorical. Numerical measurement, like what we get from using a ruler, provides a numerical value as the result of the measurement, while categorical measurement, provides a category. For example, when we fill in a survey about our nationality, ethnicity, or school graduated, those are all categorical measurements.

But simply because measurement has created a number or category does not mean that it is good enough for science. Measurements also need to be standardized and consistent. For example, when we measure the length of a table, we would expect the same number no matter whether it is done by a human or a machine. In our everyday life, we are so accustomed to machines making measurements for us, we hardly even think about it anymore. We no longer measure time base on the angle of the Sun in relationship with the horizon, but simply read it off our cell phone screens. When we drive cars, our car measures the number of miles we drive and tells us how fast we are driving. The reason that we trust those machines to perform the measurement for us is that we know they are consistent. Every day at the same time, every phone will give us the same hour and minute given the same time zone. When we drive on the high way, cars can maintain their distance when driving at the same speed limit, because the readings of cars’ speedometers are consistent and standardized across all cars. When we measure the same thing twice, we expect the same results.

When we measuring length and time, everyone agrees what a meter or a second is, as it is defined by an international standard. But when we talk about other types of measurement, such as the unemployment rate, things get murkier. According to the U.S Bureau of Labor Statistics, there are 6 different standards for calculating the unemployment rate, U-1 to U-6. For example, the U-3 standard considers anyone who has been working at least one paid hour in the past week as employed, while the U-6 standard counts anyone who was not given enough work hours as unemployed. So using the U-6 standard for calculating the unemployment rate will naturally give a higher number than the U-3 standard. Unfortunately, many unemployment rate reports on the news often omit which standard the rate is measured under. When we comparing numbers from news or experiments, whether it is the unemployment rate of different times or mortality rates of different diseases, we should first be sure that the numbers are measured using the same standard. Just as we can not compare 1 meter with 2 seconds, or the U-3 unemployment rate to U-6, we can not make any meaningful comparisons when the statistics have different units. A 20% unemployment rate in the U-6 standard does not mean 20% of Americans are out of a job. While a 5% unemployment rate in the U-3 standard does not mean only 5% of people do not have fulltime jobs. Before we try to draw conclusions about those numbers, we should first try to understand how are those numbers are defined and measured.

Here is another example. Many sociological and psychological studies are interested in human happiness. But how can we measure human happiness, that is inherently subjective and invisible? Instead of directly measuring people’s happiness, what the researchers usually measure, is what people say about how happy they are. Those are called proxy variables. The proxy variables can give us insights into things that may not have objective definitions. But we should not mistake proxy variables with the original variable of interest. What people say how happy they are, is not the same as how happy they really are. Similar proxy variables are also used for personality tests and intelligence tests. We do not directly measure human personality or human intelligence, but how people respond to hypothetical scenarios and how they perform in a set of human-designed tests. But does objective human personality or intelligence measurement truly exists? Or maybe personality or intelligence are just human constructs that we made up to make sense of the world, that do not reflect the reality. If so, then isn’t it ludicrous to want to categorize all humans into a small set of personalities, or to compare human intelligence by reducing such a complex concept into just a couple of numbers?

We have so far defined what are good measurements, but how do we deal with measurement errors? When we measure the length of an object using a ruler, we might get slightly different numbers just because we align the ruler a bit differently each time. Furthermore, as stated by the uncertainty principle, there is a fundamental limit of how accurate we can measure both the position and momentum of a particle. But if there is inherent uncertainty, how can we repeat the same experiment and always expect the same results? Well, it depends on what are we measuring. When dealing with probability, a consistent measurement does not concern with the behavior of a single case, but the distribution of the aggregate. Come back to our coin toss example, then we toss a coin 8 times in a row, we are most likely to get different sequences of heads and tails with every set of 8, which means that measurement of the heads and tails sequence is not consistent. What is consistent though is a measurement of the probability. For that, when we toss the coin enough time we can get a very close estimate of the chance of heads. Similarly in quantum mechanics, when we measure the position of a photon in the double-slit experiment, what interests us was not the position of each photon, but the position distribution of many many photons.

Let us take a deeper look into how to measure probabilities. I claimed that the coin had a 70% chance of getting heads and a 30% chance of getting tails. How can we confirm it with measurements? If we flip the coin once and we get a head, does it mean that the coin has a 100% chance of getting heads, and my claim was wrong? Of course not. We can’t make a probabilistic claim with a sample size of only one. But, what about two? Now we might get one head and one tail, that’s 50% and 50%. Does it mean that the coin is fair? Again, no. I will skip the math, but the chance for a 70/30 coin to get one head and one tail in two tosses is about 42%. Also, the only probability we can calculate, with only two tosses is 0%, 50%, 100%. We need more data, a lot more data. When we flip the coin 10 times, we will have a 27% chance to get 7 heads and 3 tails. It is still not very high. How about 100 times? We will have an 8.7% chance to get exactly 70 heads and 30 tails. Wait a moment, the probability has just dropped! It may seem paradoxical at first. But let’s consider the following scenario: two persons measure the length of a short bridge. If we ask them to measure up to the accuracy of a meter (about three feet), it is a lot likely that their measurement gonna much up exactly than if we ask them to measure up to a centimeter (less then half an inch). That is, the more accurate we want, the less likely the results gonna match exactly. But if we agree to accept some error margin before the measurement, then it is more likely that the measurement can be reproduced. So we made an agreement, that instead of requiring the measurement to be exact 70%. You would accept that my claim is true if the measurement falls between 69% and 71%. So, what is the chance of getting 69 to 71 heads if we flip the coin 100 times? It turns out only about 26%. That is there is a major chance your measurement will contradict my claim about the coin with 100 flips, even if I was telling the truth! How about 1000 times? The chance of getting 690 to 710 heads is about 53%. Better, but still not great. What if we want to be 99% sure that I wasn’t lying about the coin? Well, it would require about 15,000 flips. That is a lot of coin flips. It does put drug trials with hundreds of patients into perspective, doesn’t it?

So does it mean that we just need to perform millions of tests and we can always get an accurate measurement of the probability? Not quite, there are certain events that are so unlikely that even millions of trials may not be enough. Here is an example, let’s say that I bought a lottery ticket and hit the jackpot. It is unlikely, but every day there are hundreds of millions of people buying lottery, and thousands and thousands of winners were drawn. It is not impossible, but merely improbable that I hit the jackpot with my first ticket. Now, I am interested in measuring what’s the likelihood of someone to hit a jackpot. So I bought another ticket, I didn’t win this time. Does it mean the win rate of the jackpot is 1 in 2? Again, no. The sample size was too small. So I bought 1,000 more tickets, no jackpot. 1,000,000 more tickets, still no jackpot. Does that mean the chance of winning the jackpot is 1 in a thousand, 1 in a million, or 1 in a billion? Well, we don’t know because we only have one positive sample. If we bought 1 million tickets and win 1 thousand times, we would be quite confident that the chance of winning the lottery is about 1 in a thousand. But when we only have one or a few positive samples, It is impossible to make accurate measurements of the probability of positive results. Therefore in addition to have a large sample size, we also need to have a significant number of positive results to be able to accurately measure the probability of the positive results. But unfortunately, for most scientific research, data collection is expensive. Many research papers were published with relatively small sample sizes and maybe a handful of positive results.

There is a common misconception that the uncertainty in quantum mechanics is the ultimate cause of the uncertainty in the events of our daily life. When we measure the polarization of two identical photons, we can get different results. But if we give someone a drug she had an allergic reaction to, odds are, she will have another allergic reaction. The results of measuring photons are random, but the result of getting an allergic reaction from a drug is usually deterministic. When we say 99% of the time the drug is safe, what we mean is that 1 on a hundred people use it will experience negative effects, not that if anyone who uses the drug 100 times might get one allergic reaction. The majority of the people using it probably will never experience any side effects, while a small number of people would get an allergic reaction every time. We can not be certain when measuring quantum particles because we are limited by using photons as our measuring instrument. But the uncertainty in our daily life is generally due to the limitation of the model we use to predict the future. When a doctor prescribes a medicine to patients why does it work for some patients while can trigger allergic reactions of others? It is because we do not have an accurate physiological model that includes all the necessary factors needed to predict how the patient will react to the drug. It’s like trying to calculate the top acceleration of a car, by only given the type of gasoline and the acceleration of 100 other cars that all have different sizes, shapes, and manufactures, without knowing much about the engine or gears of the car. We can, of course, try to build a more accurate physiological model that aims to better capture humans’ reactions to drugs. Many personalized medicine researchers aim to tackle this question, instead of developing drugs that probably gonna work for most people, probably most of the time.

Be able to have consistent measurement is crucial to the Reproducibility of the scientific method, which I will discuss next time along with the relationship between science and mathematics. As we can see that error exists in all measurements. When we repeat the same experiment and get different results, instead of concluding that results that contradict our hypothesis are caused by experimental error, we should think about how we can redefine our measurement to incorporate the unavoidable uncertainties in future experiments, whether performed by us, other scientists, or even machines. So that anyone who repeats the same experiment can and will get consistent outcomes.


Before I end this episode, I want to leave you with a parable. A man was given a quantum coin. It uses nanotechnology that entangles itself with future events. He was told that by using it, he can predict any result of future football games. But, due to the complexity of the coin, it has some quirks to it. It has to be tossed just the right way to be able to make predictions accurately. But the quirk is difficult to describe or measure, instead, he knew that he has used the coin correctly when the predictions are right. So, he used the coin to predict 20 games, about 10 of which, it predicted correctly, and 10 of which it has not. Because he knew the quirk of the coin, he knows that the 10 mispredictions are due to his manual error, therefore those were outliers. He published a paper on this nanotech coin using the 10 games that he knew he has used it correctly. The peers who reviewed the paper all marveled at the accuracy of the prediction. The paper published, and people were aspired by the technological breakthrough of this coin. But does the coin really work?

Philosophy of Language: The Language Game

The Sun rose from the East in the morning. Is the sentence true? What does mean that it is true? Does it mean that the event that the sentence described has occurred, or that the sentence is a perfect representation of what has happened? How can a string of symbols or sound represent the physical world? Is it possible for a sentence to perfectly represent what has happened? What does it mean by “the East”? The Earth is roughly spherical shaped, if we follow any direction long enough, we will get back to where we are. Do we define “East” as the direction that Sunrise is? Then doesn’t that just made the sentence a tautology? “The Sun rose from the direction Sun rises in the morning.”

When we talk about true or false, we are almost exclusively talking about how accurate a representation is in comparison to what it is trying to represent. It not only applies to sentences but also to photographs, paintings, videos, and recordings. An event that has happened, has happened, it can not be true or false. But what words we choose to describe the event can be labeled as true or false based on how accurate and objective our description of the event is. But what does an objective or accurate description mean? “The Sun rose from the East in the morning.” Is it accurate? Can a 9-word sentence accurately capture the complex movement of the trajectory of the Sun moving across the sky and how its first light piercing through everything that it touches and leaves the grass, the tree in its absolute radiance? Is the description objective? There are literally thousands of visible stars in the sky, and each is its Sun in its solar system, what made our Sun, “the Sun”? Isn’t that a subjective choice we made based on the bias of our personal close relationship with our closest star?

One of the most important works of modern philosophy and science is Ludwig Wittgenstein’s Philosophical Investigations. Its insight into how human language works has shaped how linguists study languages, how historians view history, and even how scientists discover the laws of our natural world. In this episode, I will dive into what Wittgenstein calls the Language Game, and discuss the meaning of truth, and the language’s relationship to reality. You can learn about how words get their meanings and the nature of mathematical language in my previous episodes on the Philosophy of Language.

Throughout most human history, the common belief was that language has a special connection to reality. In Plato’s Republic, Plato proposed the theory of forms. He believed that every general word was created by the Gods along with a perfect object corresponding to that word in a higher reality. A specific table is an imperfect shadow of the perfect table in the higher reality created by the Gods. And the word “table” corresponding to the perfect tables in that higher reality. Plato believed this is true for all words and ideas. There is a perfect “Cat”, a perfect “Dog”, or a perfect “Society”, utopia in that higher reality, and we can figure out what they are through dialogues, hence the Republic. This theory only works, because Plato and many other philosophers and theologians in ancient times believed the true reality never changes. All things and ideas existed that have always existed. Any thought we have, or sentences we spoke, must correspond to something in that higher reality, and language has a special relationship with the reality itself. In the Old Testament, the name of God is a special word that should not be uttered without reverence. During medieval times, when people believe in the power of magic and witchcraft, certain sounds or incantations were believed to have the power to alter reality. The word “Bear”, originally means “the brown one”. It was believed that utter the real name of the “Bear” would summon one, and now we are only left with “Bear”‘s substitute name. Even nowadays, we call certain words “curse words” because it was believed that those words will bring bad fortune when spoken.

The idea that language has a special connection to reality, is originated from the Platonic assumption that the higher reality itself never changes and our reality shifts because it is only a flickering shadow of the higher reality that we can not perceive directly. But modern science has deeply challenged that assumption. We have discovered new stars from supernovas. Evolution has shown that cats and dogs have changed in their appearance and behaviors throughout history and will keep changing. Technological innovations had presented us with new gadgets and ideas that never existed before, and we invented new words to describe them. I doubt that Plato would know what “Netflix and Chill” means. So, maybe the theory of Forms is wrong, and language does not have a fixed and predetermined relationship with reality at all.

Modern linguists have more or less agreed that the sounds and symbols we use to represent things are mostly arbitrary. I say mostly because for some words representing certain sounds, the sound of a word usually shares some similarities to the sound that it is trying to represent, like “meow”. Similarly, for Pictographic languages, such as Chinese, words were originally drawings of the things that it is trying to represent. But, with time, those words become more abstract, and often completely lost their resemblance to the physical things. We don’t have any scientific evidence that certain words have magical powers, nor do they summon bears. Especially with the rapid extinction rate of species, it becomes less and less likely that words can summon wild animals. As far as to curse words, they only bring bad fortune because as a society we found those words unacceptable and therefore punishes those who speak them. There are many English words that sound like bad words in Chinese, but nothing bad happens to people speaking them in English. Though people probably won’t make many friends if they use those words when they are in China.

MagrittePipe.jpg
The Treachery of Images by René Magritte

The Treachery of Images is an oil painting created by René Magritte in 1929. In the painting, a realistic depiction of the pipe was drawn. But underneath it, the caption says: “This is not a pipe”. The artist wants to remind us, that no matter how realistic an art piece looks, it is not real, only a representation of reality through someone’s perspective.

There is nothing special about language. Like paintings, videos, photos, or audio recordings, it is just an abstract representation we use to communicate about the physical world. But unlike other media, we commonly forgot that about language. Like Plato, we often think that by engaging in a pure linguistic debate we can somehow reveal some truth about the world, which is what 99% of the internet is made of. Linguistic arguments are great when we have all of the information we need to come to conclusions. It helps us eliminates contradictions in our beliefs and arguments. But it won’t help us if we do not have sufficient information. In those cases, we will need to collect more data. We can’t figure out what the food taste like from a photo of the food alone, we can’t figure out whether Newtonian mechanics or Einstein’s relativity are better at capturing laws of the physical world by mathematical arguments alone, and we can’t figure out which policy is best for the economy by engaging in endless political debates. But I do blame the ancient philosophers for our bad habit of seeking objective truth through pure verbal argument. Plato wrote a whole book the Republic in which Socrates debated over other philosophers so to find out the best way to organize human society. Ancient Chinese Confucianism writers believed the best way to learn about governing is to read books of governing written by old Confucianism writers. Medieval scholastic scholars believed that the only way to gain knowledge is to read about what ancient writers wrote on what they know. By the way, our modern education system is still very much resembles scholastic philosophy. We teach students to regurgitate sentences that the textbook writers wrote and test their abilities mostly on memorization instead of understanding. What does it mean to understand something is an extremely complex topic and the central question to epistemology. But I will leave it to another episode. But for now, language is just an arbitrary collection of symbols and sounds we use to represent the physical world. Like the game of monopoly, we made up the rules about what color or shape we use to represent properties, jails, and the character, what dies rolls means what events. Similarly, in our language game, we made up the rules about what letters and sounds we use to represent things surround us, and what sequence of words we use to represent relationships. Is it a “black cat”, or “gato negro”? Should the adjective be before or after the noun? We were taught to play this game when we were kids, and we accepted that whatever rules were taught to us are natural, because they feel natural to us. But those rules are made up by humans. And just like there are different versions of monopolies and poker at different places, we have different rules for what we consider the same language in different communities.

In the now-famous “The Habitual Be” study, conducted by Janice Jackson from the University of Massachusetts Amherst, groups of black and white children were shown images from Sesame Street. In one of the images, the Cookie Monster was sick and lying in bed without any cookies, while Elmo stood nearby eating a cookie. Then Janice asked the kids: “Who is eating the cookies?”, all kids pointed to Elmo. But when Janice asked the kids: “Who be eating the cookies?”, the white kids pointed to Elmo, while the black kids pointed to the Cookie Monster. The reason for the different choices is that in what we consider as the standard American English, “be eating” is not proper grammar, so the white kids considered that it means the same as “is eating”. But in African-American English, “be eating” and “is eating” are grammatically different. “Is eating” means the subject is currently eating, while “be eating” means the subject has the habit of eating. The Cookie Monster is usually the one eating the cookies on Seasame Street. So when asked “Who be eating the cookies”, for African American English speakers, it should be the Cookie Monster, not Elmo.

We have the tendency to consider the language we speak as the “correct” one. My automatic grammar checker is currently screaming at me for using “be eating”. For those who speak a similar language as ours but with some differences, it is easy for us to consider them as less intelligent or less educated for those grammatic nuances that we are not familiar with. But the fact is, language is a living phenomenon, there is no one correct language. Each linguistic community can have its own twists and house rules that are different from other communities. There is no one correct way to speak English. If you travel across France to Germany, you don’t find a line where everyone suddenly switches from French to German. But rather a gradual dialect shift from place to place. The more North you travel, the more Germanic the language becomes. This flexibility of language is something that we should celebrate, rather than trying to force everyone to speak the same way as ours. Our language is far from complete, there are still many more ideas and thoughts that we can’t yet express. And we can’t even know what those ideas and thoughts are because we can’t express them in our language. The flexibility and inventiveness of language allow language to evolve with our thoughts, culture, and accumulated knowledge. 500 years ago, human right was not a concept that exists in our language, but nowadays we have developed a cornucopia of vocabularies and ideas surrounding human rights and equality, and almost everyone grows up in the U.S. knows about them and can discuss them. In mathematics, we invented the language of calculus that allowed us to discuss and examine concepts such as differentials and laid foundations for modern physics. We should be careful not to dismiss ideas simply because they are unfamiliar and different. Instead, we should try to understand them as charitable as possible. Give them a chance and they might surprise us. After all, we do not know everything yet.

Even though the rules of language are somewhat arbitrary and there are many different languages all with different rules that can express similar ideas equally well, not all linguistic rules are capable of giving the language its meaning. When language is used for deception and manipulation, it is usually kept ambiguous and inconsistent, so that it can confuse the listener and create the illusion of successful communication without actually achieving it. But for a language to be used to carry out meaningful communications, it requires to be logically consistent at least to some extent.

If you walk into a bar and want to order a drink, but everything on the menu is called “The Special”, how can you order what you want? To carry out meaningful communication, a language should be as clear and unambiguous as possible. To be able to order the drink you wanted, not only that each drink must have unique names, but the name you think a drink has must also be the name that the bar has given to it. If you have listened to my previous episode, you will know that this is the linguistic agreement test. “The Habitual Be” is a valid grammar because, within the linguistic community that uses the grammar, the member consistently uses “be doing something” to refer to “someone has a habit of doing” something. If members of the community sometimes use it as “someone has a habit of doing” while in some other times in the same situation would use it as “someone is currently doing”, then the listener would not be able to differentiate the meaning, then the grammar would have lost its power to communicate, and would no longer be valid. Here “in the same situation” is crucial. As much as we want languages to be as clear and unambiguous as possible, it is simply not practical to have every single word one meaning, one meaning only. It would require us to invent new words or compounded words for every different experience we have, which is infinite and unique to each person, then we won’t be able to communicate at all. Instead, we use words that describe our existing experiences and use them in new places with some similarities to give them new meanings. “Mouse” used to only refer to the small furry animals, but now, it also refers to the electronic devices we attach to our computers. But when I say “My mouse has stopped working.”, you can almost certain that I am talking about the electronic device, not the animal. But how?

Context. Because we give language meaning by simply using it to refer to something in a context, sentences are often extremely ambiguous by themselves without the context in which they were spoken. The same words can have different meanings when used in different situations. A sentence can be either request or command based on how it is spoken. “Could you pick up the mails?” vs “Could you pick up the mails!”. We use rhetorical devices such as hyperbole or sarcasm to invert the literal meaning of a sentence. “I am sure you have literally never heard of any sarcasm before.” But yet, we seem to be able to understand each other most of the time, well, at least we think we understand each other most of the time. The question of what does it mean to understand something has haunted philosophers and teachers ever since we start to teach. It is a very important question as we use language to pass on knowledge, but what if what we think we are teaching is not what the student has learned due to the ambiguity of the language? When we read the Constitution as the guiding principle for government, how can we know that what we think it means is actually what the writers meant? I encourage you to think about this question as it is one of the most profound and important questions in epistemology. All knowledge we learned is kind of dependent on the answer to this question. I will leave it for you to contemplate for now and get back to it in a later episode. But regarding clarifying the meaning of a sentence and creating a verbal agreement, we play the language game of “question and rephrase”. We learned the game while we are fairly young, we try to rephrase a sentence someone else has said, and if we do not have enough information, we ask more questions for more context. We repeat until the other person agrees with what we rephrased it as is what they meant. Note that this process is completely subjective and simply because we agree that we understood each other, doesn’t mean we actually do, as anyone who ever had a misunderstanding before can testify.

Wittgenstein pointed out that all languages are private, that is, we can not communicate to others what a sentence truly means to me. He gave the following example: consider a group of people all holds a box with something in it. The only person who can see what is inside is the person holding the box. So no one knows what’s in everyone else’s box. Everyone refers to what is inside their box as “a beetle”. Just because everyone agrees on the word that is used to refer to what is in the boxes, does not mean everyone has the same thing in their boxes. Similarly, just because we agree on the same sentences for describing the same event, does not mean our understanding of the event is the same. What does it mean to understand something is far more complicated than just to agree on the word we use to describe it. But I will leave this point for later.

Now coming back to the question “The Sun rose from the East in the morning.” Is the statement true? There is a sentence I like a lot: “you can no more change history than change the fact that there are 9 planets in the solar system.” 20 years ago, if you said that there are 8 planets in the solar system, people would probably laugh at you and question the education you got. But now if you say there are 9 planets in the solar system, some may say that your education is out of date. But how can it be that two sentences: “There are 9 planets in the Solar system” and “There are 8 planets in the Solar System” both be true but at different times period? 8 is certainly not equal to 9? The planets are still following the same orbital cycles, and have roughly the same shape and size. What has changed isn’t the physical reality, but our definition of the word “planet”. With the old definition, there were 9 planets in the Solar System. Now, with the new definition, there are only 8.

Let’s consider the word “Democracy” and what it refers to. Ancient Athens was famous for its proud democratic system. But in their cities, women, and slaves have no right to vote or speak in public symposiums. How can they call themselves democracy while the majority of the population can’t vote? Just about 100 years ago, women in the U.S.A. finally won their rights to vote, and many countries didn’t extend voting rights to women until after WWII, while the West has been boasting its democratic system for centuries. But we are different now, right? Today, everyone who deserves the right to vote has the right to vote, correct? Yet, citizens under the age of 18 still can not vote, immigrant workers can not vote, and prisoners who have been convicted of felonies can not vote. You may argue that they don’t deserve the right to vote. But who “deserves” the right to vote is heavily dependent on our definition of “democracy”. All societies that consider themselves democratic, believe that they only exclude those who do not “deserve” the democratic right. Politicians in Ancient Athens have lengthy arguments on why women and enslaved should not participate in politics. So did the founding fathers of the United States. They considered their political system to be democratic, exactly because they believed that all people who deserve the right to vote got to vote. But who decides who is deserving and who is not? Taking teenagers as an example, a common way to justify excluding them from the democratic process was to say that they are not mature or knowledgeable enough to participate in the political process. But there is a fundamental problem with this argument, that is, we can’t really objectively measure that. If we have an objective way to decide which bill or which candidate is better, then we won’t need voting at all, won’t we? Some may argue that they simply do not have enough education to make good and sensible choices. Besides the fact that we have no objective way to determine what is good and sensible. The argument itself is also problematic. If this argument is enough to ban teenagers from voting, shouldn’t it also enough to ban anyone who does not have enough education from voting? That is certainly not very democratic, isn’t it? The truth is, there are no real good logical reasons behind preventing teenagers from voting, besides that is the definition of democracy we have accepted in our language, and we were just trying to justify that definition. In the future, maybe in a society that allows teens to vote, when they look back to us, they will find our definition of democracy hypocritical, just as how we saw the societies before us. The fact is, it is not that our political system is democratic, but rather we are defining the word “democracy” based on our current political system because we consider it as the paradigm case of the word “democracy”.

This is a harrowing fact. The same sentence, one day was true, the next day, it became false. Not because our physical reality has somehow changed, but the meaning of the words has been redefined. A sentence that we consider as objectively true today, may turn out to not be so true tomorrow, just because we have given the words a new meaning? If so, is the sentence really objectively true? Can any sentence be objectively true?

Most modern philosophers say, no sentence can represent objective reality, even this sentence itself. Some people interpret post-modernism as the denial of the existence of objective reality itself. But that is not accurate, many postmodern philosophers believe in the existence of objective reality, but they don’t think language or any media can represent that reality perfectly or objectively. The idea of our language is only a representation of reality, not reality itself is actually not new. Medieval Christian Philosopher Thomas Aquinas postulated that everything we assert about God is mere “analogical predications”, i.e. an analogy, or representation of God’s attributes, and we can not say anything objectively true about God’s nature. What words we choose, what meaning we give to those words, or if you are using photographs to capture reality, how the photograph is framed, who do we choose as subjects are all subjective choices based on our value. So instead of insisting on our version of reality as the objective one and force everyone to agree with us, we should present it as simply one of many interpretations of the facts we have at hand.

That been said, simply because all sentences are subjective, does not mean all statements are equally valid. For instance, whether there are 8 or 9 planets in the Solar System, is up to debate based on the definition of the word “planet”. But, if I say Mars is not a Planet because it is not really orbiting the Sun, that is false. Because there are no agreed definitions of the words “Planet” and “Orbiting”, that allows this sentence to logically consistent with our perceived reality. Maybe one day the sentence will become true. But at least for now, ” Mars is not a Planet because it is not really orbiting the Sun” is a false statement, because it is logically inconsistent with the existing definitions of the words and our perceived reality.

Logic is what gives the word “truth” meaning. In our minds, we all hold a collection of beliefs that we consider true. Some are based on our experience, for example, when we know touching fire would hurt because we have all been that kid before. Some are based on what we have been told like that breakfast is the most important meal of the day. Some we don’t even remember why we believed in them. Why can we exchange money for food, while money is essentially just cheap papers or digital data on some remote server? When we receive new information or experiences we are constantly consciously or unconsciously trying to fit them into our own existing belief system using some version of reasoning to assign them truth values. If what we hear or experience contradicts what we believe, we can either consider what we hear as wrong or reflect upon if what we believe needs to be adjusted. Though, in most cases, we just dismiss what contradicts our existing beliefs as false. But note that the word false is a result of the contradiction we found. Logic and reasons are important because they give the words “True” and “False” their meaning. We use logic and reasons to determine the truth value of statements within a universe of discourse or a belief system. Logic is the method of reasonings that will not lead to contradictory results, while Fallacies are reasonings that can lead to contradictory results. The topic of what is logic and what are fallacies is a long but important one that I will have to leave for a later episode. But this is important about fallacies: In a belief system that a statement can both be true and false at the same time, you can use reasoning to show any statement to be true and false at the same time. The word “true” does not matter anymore. For people who do not care about logical consistency, the word “true” often just refers to what they want to believe. Everything is both true and false at the same time, then we can just choose an arbitrary set of things to believe that benefit ourselves the most. And everyone can do the same and believe whatever they want and all beliefs are equal. In a society that is not founded on logic and reason, there is no such thing as truth, only power. The strong can force the weak to serve them as they please.

Thanks for listening, I hope this episode has helped you a bit in understanding how language works and how to be better at communication. I have repeatably used the word “logic” throughout the podcast. I know that most people don’t agree with the definition of “logic” because it is quite often that two humans look at the same evidence and come to different conclusions. In the next episode, I will give my definition of “logic”, and see if you agree or disagree.

Scientific Method, Conditional Probability, and p-value

In the last episode, I have discussed the basics of probabilities. To recap, the probability is the measurement of the characteristics of a population, not individuals of the population. If we have a bag of marbles with half of which are red and the other half are blue, when we draw one out of the bag, and all marbles have the same chance to be drawn, we have a 50% chance of getting a red marble. That is a measurement of the bag. It does not tell us what the color of the marble is if it is coming from the bag. Because marble is either red or blue, it can not be 50% of each.

When we get a marble out of the bag, even though we can not draw conclusions about the color of the marble, we can form beliefs about it. This is called the Bayesian view of probability. We call it beliefs instead of conclusions because we can not guarantee what we say about the marble to be correct every time. Consider the following experiment: I draw a marble out of the bag, guess what its color is, then put it back, and repeat. If I say the marble is red, 50% of the time, I will be right. Similarly, if I say the marble is blue, 50% of the time, I will also be right. So I can say that I have 50% confidence that the marble is red, and 50% confidence that the marble is blue, though I can not say what the color of the marble is for sure. The way to verify my confidence is to conduct the experiment above and repeatedly draw marbles from the bag again and again, when we have collected enough data, to see if the ratio of the two colors converges to 50-50. Note that we have to redraw the marble every time so that we do not examine the color of any specific marble, but the color of every marble drawn from the bag. Because once again, the probability is the characteristic of the bag or population not of any given marble. If we keep examining the same marble, then we will always get red if the marble is red. Then if our guess was blue, we would be wrong all the time, instead of getting it right half of the time. Therefore, our 50-50 confidence is not about any specific marble, but that if we draw marbles repeatedly from the bag, we know we will get it right 50% of the time if we guess red, or blue.

Now consider a new bag of marbles, with three different colors instead of two, say red, green, and blue. If I draw one out, with similar reasoning, we can say that we have 1/3 confidence that it is red, 1/3 confidence that it is green, and 1/3 confidence that it is blue. What if, after the draw, I first checked the marble for you, and tell you that the marble is not green. What should your confidence be? Because we know that now the marble can only be either red or blue, and they have an equal chance be drawn from the bag, therefore, our confidence should be 50% red, and 50% blue. This is all and well reasoning. But in the spirit of the scientific method, we also need to verify that it is correct. How can we design an experiment to see if our confidence is correct? We can repeat the similar experiment as above with some modifications. We take a random marble from the bag, if the marble is green, we put it back directly and do not count this sample. If it is not green, we count if it is red or not red. Afterward, we put it back. We repeat this process and until we have large enough samples, then we compute the ratio between the number of red and not red, or in this case, blue, marbles that we counted to see if it is close to half-half split. This is the formal way to verify our conditional confidence. We can call it post-sampling filtering, because it still takes samples from the population as how we verify the regular probability and filtering the samples after by removing those that do not satisfy the condition. In this case, the samples are green. Besides post-sampling filtering, there is another common process that can also get us the correct answer sometimes, if used properly, that is the pre-sampling filter.

Pre-sampling filter, as the name suggests, filters before we take samples. We can first remove all of the green marbles from the bag, and then start taking random samples from the bag and calculating the ratio between red and non-red samples. In this case, we know that we will never get any green marbles, as we have already removed them from the bag, therefore, the ratio between red and not red samples is the conditional probability of drawing a marble from the original bag that is red, given that we know it is not green. I want to note that the pre-sampling filter changes the population. After removing all of the green marbles from the bag, we are no longer sampling from the same bag and therefore, we are not answering the same question. The original question was about a random marble taken from the bag with mixed green, red, and blue marbles. But the bag we are sampling now only has red and blue marbles. The pre-sampling filter has changed what we are measuring. But it does happen to have the same answer to the original question if removing the green marbles does not influence the chances between the red and blue marbles being drawn.

In the marble example, it may be bluntly clear therefore we can assume that removing the green marbles won’t impact the chance between the red and blue marbles being drawn. After all, how would it be that by removing the green marbles, we can make red marbles less or more likely to be drawn than the blue? Even though it is intuitive to just assume that, but it is generally bad scientific methodology to make assumptions without evidence or data. For instance, some red balls could have been removed by accident if the person who performs the task suffers from red-green colorblindness. There are also far less obvious examples for potential errors in the pre-sampling filter when measuring conditional probability.

Consider three products on the market, that serve similar functions. They may be phones, computers, or paper towels, but they are of different qualities and prices. You want to conduct an experiment to see the ratio between people who buy the lowest tier product vs top tier product. Because you don’t really care about mid-tier products, you only asked your participants to choose between the lowest and top tier. It may seem to be a sensible way to conduct the experiment only include things you care about. But in fact, removing the mid-tier product from the market actually changes what people buy. This is what market phycologists referrers to as the decoy effect. Dan Ariely used the following real-world example in his book Predictably Irrational:

The Economist magazine offered three types of subscriptions: the cheapest option was 59 dollars a year that provide only online access. The middle option was 129 dollars a year that provided only prints. The top-tier option provided both prints and online access for 129 dollars. Yes, the same price as the middle tier but with online access thrown in. It may seem that no one would ever choose the middle tier as it is the same price as the top tier but offers fewer benefits. And you would be right. The middle option was not presented for people to buy, but to show how good a deal the top tier option is, and therefore incentivize people to purchase the top tier option. Indeed, Ariely conducted his own experiment amongst his students, that when presented only the low tier and top tier options, students are more leaning towards purchase the lower tier, maybe because of their limited budget, but once the middle tier was introduced to the mix, more people chose the top tier than when there is no middle-tier option.

Therefore, it may seem sensible to remove options that we don’t care about before we sample. But, in this case, by removing the mid-tier options, we actually changed the probability distribution of the other options and distorted our answer. Instead, we should perform a post-sampling filter, and sampling with all the three options first, only afterward, remove the samples we do not care about.

Here is another hypothetical example of sampling bias introduced by the pre-sampling filter. Let’s say that we want to measure low-income families’ access to broadband internet. To remove high-income families from being sampled, we did the surveys in the low-income neighborhoods around Los Angeles. This filter did exclude the wealthy. But it also introduced sampling biases as not all low-income families, especially those in the rural regions in the U.S., have the same internet access as low-income L.A. neighborhoods.

I hope that you can see now, that not all samples are created equal. Probability is more than just data, but how we collect them. When comes to conditional probability, the post-sampling filter is the true measurement of the empirical conditional probability even though it means collecting more data than sometimes needed. The pre-sampling filter assumes that the filter we applied does not distort the underlying probability, which is often hard to guarantee. And instead of just assuming it, we need to provide data of both pre-sampling and post-sampling filters to show that they do get the same results.


To recap, conditional probability, is the ratio between the number of positive samples and the number of total samples, given some conditions that all samples have to satisfy. In the marble example, the positive samples are the red marbles sampled from the bag, the condition is that the sample is not green marbles. So the total number of samples that satisfy the condition, i.e. not green, is the number of red or blue marbles drawn.

Now I have discussed the conditional probability, I will introduce null-hypothesis testing and p-value, which are the bread and butter of the so-called data-driven science. I want to remind you once again, that probability is the characteristic of a population, not any single entity of the population. For example, the probability of drawing red balls from a bag is the ratio between the number of red balls drawn from the bag and the total number of balls drawn in a repeated sampling process: taking a ball out, see if it is red or not, then put it back, and repeat. It does not however tell us if a ball is red or not given that it is from the bag unless the probability is 100% or 0%. In those cases, we don’t really need probability for computation anymore. I am emphasizing it again here because this point will be crucial when we discuss the probability of a hypothesis being true later.

A hypothesis is just a proposition about a group that we want to test its truth value of. For example, all swans are white is a hypothesis. In the falsifiability principle episode, I have extensively discussed how to test hypothesis is one of the pillars of the scientific method as proposed by Karl Popper. One of the keys to differentiate science from pseudoscience is that scientists try to prove their hypothesis to be false, while many pseudosciences try to prove their hypothesis to be true. It may be weird to try to tear down your own hypothesis. But it is for a good reason because science studies universal laws, so we can use those universal laws to predict and manipulate the future. Consider the following hypothesis: All swans are white. There is no way for us to prove, logically, that this statement is true, as it requires us to examine all swans from the past present, and future and make sure they are all white. But we can easily prove that it is false, by simply find a non-white swan. It doesn’t have to be black, any color that is not white would suffice. So instead of trying to find all the white swans around the world to confirm our hypothesis, the scientific method asks us to try our hardest to find swans that are not white, and only in failing in finding one, we may consider, not prove, that our hypothesis to be true. In contrast, many pseudoscientific claims simply find some data that supports their hypothesis and say that the positive data has “proven” their hypothesis while ignoring the negative ones.

The hypothesis is the proposition that we want to test the truth value of, but what is the null-hypothesis? There is a broad misconception that null-hypothesis is the negation of the hypothesis. For example, if I have a drug that I hypothesize that will improve test scores, the null-hypothesis would be that it does not help improving test scores. It may sound perfectly sensible in English, but logically, it does not hold water. The negation of a statement means that two statements can not both be true at the same time and also if one is false the other one must be true. For instance, the negation of all swans are white is some swans are not white, not all swans are not white. Because if some swans are grey, some are white then both all swans are white and all swans are not white are false. In this case, is my hypothesis that the drug will always improve test scores? Or only sometimes it will? Because the negation of the two is not the same. If my hypothesis is that the drug will always improve test scores, then the negation of it would be that sometimes the drug does not improve or even decrease test scores. If my hypothesis is that the drug will sometimes improve test scores, then the negation of it would be the drug would never improve test scores. In reality, the null-hypothesis is actually not a negation of the hypothesis, but the hypothesis that two groups of samples are coming from the same population.

For instance, if you have two groups of test takers, taking the same test and you get two sets of test scores, one from each group, the null-hypothesis is that the two sets of scores are coming from the same population. You may have noticed now, the null-hypothesis has nothing to do with the drug at all! Even if we can show that the two groups of samples are not from the same population, we can’t really be sure that it is the drug that has caused the difference. Correlation is not causation. For example, if you are looking into a marketed drug that supposedly improves test scores, so you sent out a survey to the students who bought the drugs and asked about their scores and compared it to the average score of other students who didn’t take the drug and found out that the student who bought the drugs have a higher average score than others. Assuming that this evidence is enough to conclude the null-hypothesis to be false, which I will show later that it does not, what it tells us is that the student who takes the drug is a different population from the students who don’t, it does not tell us that the drug actually improves scores. There could very well be other reasons why those two groups are different. For instance, maybe the group who bought the drug are significantly wealthier as they can afford it, therefore have access to better educations. A controlled experiment can eliminate some of the differentiating factors. But to make those experiments satisfy the criteria of the scientific method: observability, falsifiability, and reproducibility, we will have to reconsider those experiments in a different framework from null-hypothesis testing and p-value. But I will get to it later. Let’s talk about the p-value first.

The p-value is the conditional probability of the data given that the null hypothesis is true. A couple of things, first, the p-value is a conditional probability, therefore, to measure it, we will need to use the post-sampling filter that I have talked about above. I will get to measuring the p-value in a bit. Secondly, because we want to show the null hypothesis to be false, though knowing that that null-hypothesis is false doesn’t really show that our original hypothesis to be true as I have noted above, but let’s put it aside for now, we want to see that the p-value is small. Intuitively, if we have a small p-value, which means that the event is unlikely to happen if the null hypothesis is true, therefore we can reject the null hypothesis. I said intuitively, not logically, because this reasoning is not mathematically sound. It may sound like the falsifiability principle in the scientific method, but there is a key difference. Impossible events can never happen, but unlikely events happen all the time. If I hypothesis all swans are white, finding a black swan is enough to put a nail on the coffin of my hypothesis. But if I hypothesis 99% of swans are white, finding a black swan doesn’t really say much about the validity of my hypothesis. It is hard to win lotteries, but there are people around the world who win lotteries every day, not always the same person, just some people will win the lottery each day. I will elaborate more on how falsely think small p-values mean that the null hypothesis is false can lead to problems later when I talk about p-hacking. But before that, let’s first see how can we measure the p-value.

So how can we measure the p-value? Well, we have to use the pre-sampling filter as mentioned above. We collect samples from the whole population, then filter out the ones for which the null hypothesis is false. You may have noticed some problems here. If our hypothesis is about a universal law of nature. The null hypothesis is either true or false, it is not possible to have some samples taken while the null hypothesis is true and others were taken while it is false. Just to show the absurdity of the logic, let’s say our null hypothesis is the gravity does not exist on earth. To measure the p-value, we just need to throw objects while standing on the ground and count how many times the object flow into space versus how many time it lands on the ground while there is no gravity on Earth. It is of course not achievable, either there is gravity on Earth or there is not. We can’t take samples for both cases. Similarly, if the null hypothesis is true, all of our samples taken will be under the law of the null hypothesis, while if the null hypothesis is false, all of our samples will be taken under some alternative laws. There is no way to take samples under two alternative physical laws, as that contradicts our fundamental scientific assumption of the universe that it is bound by a set of fixed and consistent laws that do not change with space or time. Therefore, we can never really measure the p-value. But only calculate it, with a collection of assumptions.

The computation of the p-value is pretty straightforward. But I won’t go into the equations, as that is not the point here. Instead, I want to dive deeper into the assumptions we are making while doing the computation. As I have emphasized many times before, when making scientific enquiries, assumptions without justification and verification can introduce biases into our reasoning and as a consequence tint our conclusion. More dangerously, we often embed our own assumption of the world into our arguments without knowing it, making us think that we are objective, while in fact, we are only making arguments base on our subjective assumptions that often don’t have any scientific or logical standing. Therefore, it is important to recognize the assumptions we are making when we reason, and instead of taking those assumptions to be a given, find ways to back them up with evidence and scientific method.

OK, the computation of the p-value. Let’s consider our smarty drug again. We have two groups of people, one takes the drug, the other one doesn’t. Both of them participate in the same test, and each has a separate score distribution. Our null hypothesis is that the two groups of scores are taken from the same distribution, therefore, we can use the group that did not use the drug as the baseline distribution, then calculate what is the chance to get the set of scores of those who have taken the drug, if they were sampled from the baseline distribution. But there is a problem, any specific set of scores are in general statistically improbably for any large enough sample size. For instance, if I want to test if a coin is fair, I flip it 6 times, I got HTHTHT. Even though it looks like the coin is fair, the chance I get this sequence is only 1 in 64, not very likely. Therefore, for p-value, instead of looking at the likelihood of the specific sequence of data, it actually looks at summary statistics, just as the name suggests, a summary of the data. For instance average scores, or average headcounts, etc. So, here is the first assumption we are making, that the summary statistic is sufficient to represent the important information of the data. For example, if we choose average score as our summary statistics, we will overlook other information such as variance. This drug may make the average score 5 percent higher, but it might make some people fail the exam because it gives them massive headaches. The second assumption we are making here is that the sampled data is enough to capture the underlining distribution. For instance, if I want to compare if two coins are identical, I throw the first coin 5 times and got 5 heads, then I am using it as a baseline I will construct the underlining distribution as it always gives me heads. While maybe the coin is perfectly fair but I have just got very unlucky with my throws. You may think that large sample size may alleviate the issue of unlucky throws because it is far likely to throw 5 heads in a row than 100 with a fair coin, which is a correct assessment. But large sample size will introduce sensitivity issues. For instance, we won’t be surprised if the average of 5 people’s exam score to be 10 percent higher than the population average. But if we take 5,000 people’s exam scores, we would expect them to be pretty close to the population average, if they are taken randomly from the population, even a 1 percent difference may be considered unlikely or statistically significant. Very small errors in either sampling, computation, or just approximation in assumptions can lead to statistically significant results. Therefore, we should not only look at how likely the summary statistic is but also how different the two data sets actually are by using multiple metrics, such as effect sizes.

You know now that p-value is how likely the summary statistics are, such as the average score of two groups of test takers, if the two groups of samples are taken from the same population. The p-test or null hypothesis testing is the practice of using the p-value to either reject or support the null hypothesis. For instance, if we choose a threshold of 0.05 or 5%, then if the p-value is less than the threshold, we claim that we can reject the null hypothesis. The word “reject” is in fact extremely misleading. Many people, including a lot of researchers, take “reject” to mean that we have proven that the null hypothesis to be false. This is absolutely wrong. As I have mentioned before, prove by contradiction is mathematically sound. If I postulated that all swans are white, finding a black swan is enough to show that I was wrong. But if I postulated that most of the swans are white, finding a black swan can’t really prove that I was wrong. Similarly, finding an unlikely dataset under the null hypothesis can’t prove that the null hypothesis is false either. Then if it doesn’t show that the null hypothesis is false, can a low p-value show the null hypothesis is likely to be false then? Again, no. The null hypothesis is either true or false in this universe. Making statements about the likelihood of the null hypothesis is similar to making statements about the likelihood of the existence of gravity, which does not make any physical sense. Furthermore, conditional probability is not symmetric. For instance, the chance of me having coffee given that I am at Starbucks is not the same as the chance of me at Starbucks given that I am having coffee. I could be drinking a lot of coffee at home, but if I visit Starbucks, I would always order coffee. So the chance of me having coffee given that I am at Starbucks is really high, while the chance of me at Starbucks given that I am having coffee is pretty low, given most of the coffee consumptions I have is at home.

And last, even if, we can prove the null hypothesis is false. It still does not mean our original hypothesis is true. For instance, in the smarty drug example, even if we can prove that the two datasets are not taken from the same population, which is the null hypothesis, we still can not conclude that is the smarty drug is the cause of the difference. All that the data has shown is the correlation between the smarty drug and test score. There could be other factors that actually contributed to the difference. For example, it could be that those who can afford the smarty drugs are generally wealthier and therefore has access to better education.

Even though the p-value test is riddled with logical problems, it is rather popular in academics because it is easy to get work published. If we choose a p-value of 0.05 or 5%, which is the common threshold by most journals, it means that 5% of the time, even if the null hypothesis is true, we can get data that passes the p-value threshold. Therefore I just need to test enough hypothesis, doesn’t matter if they are true or not, I can get 1 in 20 published. The chance is further increased if I try different summary statistics or do some data massage, for instance, throw away some outliers or try different analysis strategies to see which one gives me more favorable p-values. Furthermore, even if others can validate my calculation following my steps, but they won’t be able to verify my p-value with any real-world measurement as I have mentioned above that p-value can’t be measured. This practice is commonly referred to as p-hacking. But p-hacking, in fact, is not necessarily resulted from a researcher aiming to game the system, but most of the time due to misunderstanding of the mathematical meaning of the p-test as it was taught in many classes that lower p-value is the evidence to prove that their hypothesis is true. To further the issue, we often can’t see the whole scope of the p-hacking problem because the publication bias. We only see the papers published with the low p-values, but not those studies conducted that show no promises. If I want to test different candies’ impact on test scores, and I tested 20 different candies and one shows up as statistically significant, i.e. a low p-value and I published a paper with that one result while ignored all the other ones. To the public, it would appear that I have only tested one type of candies and got a good result. Therefore, it is easy for them to interoperate that the candy does have a positive effect on test scores. But if they have known the 19 failed tests I have buried, they would be far less impressed by the result and know that one positive result would most likely just by chance.

So I have just painted a pretty bleak picture of the p-value. It is in fact, very problematic in modern scientific journals. More and more scientists have started to agree that there is a reproducibility crisis in science, largely contributed to the misconception and misuse of the p-value testing. But this is not an inherent problem with statistics. We can still use statistics in science. It’s just that we need to use it under the principles of the scientific method, observability, falsifiability, and reproducibility.

A lot of people think that science studies causality. But in fact, we can’t really observe causality. What we can observe is a correlation. We just see event A happened then later B happened, and in a different experiment event A didn’t happen and B also didn’t happen. Instead of using something that we can not measure, or observe, like the p-value, we should design experiments based on standardized repeatable measurements. If three different researchers all have a different definition of efficacy, and they all measure the efficacy of the same drug, but one got 90%, one got 30%, and one got 10%, we can’t really meaningfully conclude much about the drug’s efficacy. If we use statistics in our science experiment, we need to make sure that what we measure, statistically is well defined and can be repeatably measured by different researchers. This means that things that have been only measured once, can’t be called scientific, because we can not know if the measurement is consistent or repeatable.

Regarding falsifiability, researchers should design experiments to test the prediction made with their hypothesis, not doing analysis after data collection with the aim to reject the null. If I believe my smarty drug has a 50% efficacy, I should first define how to measure efficacy. For instance, with two randomized groups of participants, one group takes the drug, the other group takes a placebo and we let them take the same tests after, and use the average score of the smarty drug takers over the average score of the placebo taker as the efficacy. Then I test my predictions to see if the measured efficacy is 50% or higher. We also need to make sure such measurement of efficacy is consistent and repeatable by performing the measurement multiple times with different groups. If the data contradict my hypothesis, then my hypothesis is wrong. But if the data support the hypothesis, we have gained confidence about its validity. Remember, we can never prove our hypothesis to be true.

Finally, and most importantly, is reproducibility. One experiment is not enough. The scientific motto is Nullius in Verba. It is not enough for me to get a positive result. If my hypothesis is correct, other people should get the same result as well. Reproducibility is arguably the most important principle of the scientific method. We can’t be confident about the validity of a scientific theory unless many people can reliably use the theory to make predictions and verify the result. But unfortunately, in the current research culture, reproducibility is often completely ignored. It is a lot easier to get attention and funding by creating a new theory that passed the p-value test once than by trying to verify someone else’s hypothesis. I would love to say that people do science because they want to gain a better understanding of the universe. Many scientists do. But they also need to pay rent and keep their lights on and maintain their research position. As long as the cultural values more headline-worthy science results over sound scientific methods, scientists who are pressured to constantly publish new papers will flood the scientific fields with false results. Many are arguing that it is happening now, and I have to say that I agree. It may sound like great scientific progress when so many papers published every year. But it is actually a disservice to both science and society as a whole when they do not follow the scientific method. If the general public is constantly bombarded with bad science and is not trained to tell the difference between science and pseudoscience, they will question, justifiably, all science results, even those studies done with sound scientific methods, such as climate change, vaccines that have existed for decades, or the shape of the earth. This is a big part of the reason I want to talk about the subject of the scientific method because it is not some obscure subject that only scientists should care about. It influences every one of us in society. When a politician uses scientific research to guide policies, how can we know if that research is well-executed, or bad science serves some agenda? I believe that every human has an epistemological responsibility to do our best to prevent the spread of inaccurate information, as bad information can often lead to bad decisions. But to be able to tell apart science from pseudoscience, we can’t just look at the superficial things, such as the author names or publication journals, but learn to use logic and principles of the scientific method to analyze the study. What does the study propose to measure? have others reproduced its result? Does the measurement deductively support their conclusion? Even then, we will still make mistakes from time to time, and therefore we should keep an open mind to different opinions and willing to accept new evidence and criticism from others, and above all, never stop learning.

To do science is not just about forming up theories based on what we see. It requires using well-defined terminologies, consistent measuring standards, and rigorous logic to form models that not to interpret the past, but to predict the future. So we can design experiments to verify our predictions. Furthermore, science requires a strong skill of communication. It is not only enough for us to get the results we wanted, we also need to communicate with others so that they can also use our terminologies, measuring standards, and models to make their own predictions. Despite the motto Nullius in Verba, science is inherently a form of human collaboration. We need both those who propose the theories and those who reproduce the results to do sound science. We should recognize the work of reproducing results just as valuable as those who publish new theories. Because, without reproducibility, those theories are just pseudoscience at best.

Scientific Method and Probability

In previous episodes, I have discussed the principles of the scientific method: Observability, Falsifiability, and Reproducibility. I have given some examples of those principles using scientific theories that can give predictions of certainty such as Newtonian mechanics. But in recent years, more and more scientific discoveries are starting to making probabilistic predictions instead of predictions of certainty, especially in fields such as biology, sociology, and psychology. You may have heard of p-value before. P-value is the cornerstone of many research papers. But unfortunately, there are many misconceptions about and misuses of the p-value, even amongst the researchers using it in their publication. In 2005, a Greek researcher John Ioannidis has published a paper named “Why Most Published Research Findings Are False”. It has noted that the fallacy of p-value can leads to false conclusions, even though researches follows the industry standard research methodologies and passes the peer review. 15 years since the paper published, it has over 9000 citations. More and more researchers in the field start to advocate for a methodology reform to overcome what they call “the reproducibility crisis”. So what is p-value then? How does it, or does it not fit into the traditional scientific method formalized by Karl Popper?

Before I get to the p-value, let’s first try to understand what probability is as defined in mathematics. Statistics has been so ingrained in our everyday language, even those who have not formally learned statistics in school would have some understanding when a news anchor proclaims a political candidate has a 70% chance of winning an election. But what exactly does it mean? How was the probability measured? Did they created 100 parallel universes with identical conditions as the present and waited it out and saw that in 70 of them, the candidate won? If the candidate does not win the election does it mean the prediction was wrong? Can we even know if the prediction was wrong if there is only one future that we live in? Mathematical concepts sometimes make into our daily language but unfortunately, more often than not, the true meaning of those concepts was not preserved. For example, the word exponential growth, in mathematics, means that the growth can be modeled by an exponential function. But when we used it in English, we just mean something grows really fast.

So, what does probability mean, mathematically? Consider a bag of 10 marbles, 6 reds, 4 blues. I am gonna use this example a lot throughout this episode. You draw one marble out at random, what is the chance of getting a red marble, and what is the chance of getting a blue? I hope that it is clear that the chance of drawing red marble is 6 in 10, 60%, and the chance of drawing blue is 4 in 10, 40%. Here is the mathematical definition of probability: The probability of a given outcome is the ratio between the number of the given outcome occurrence to the total number of events when we repeat the excise infinite amount of times. In the case of a bag of marbles, the events are drawing random marble from the bag. Even though the color of the marble we drew maybe different, but to our knowledge, we can not know what the color of the marble is before drawing it out, therefore, we consider each draw to be identical. If we want to calculate the probability of drawing a red marble, we just need to keep drawing marbles from the bag, examine its color, put it back then repeat the process. The probability of drawing a red marble is then the ratio between the number of red marbles drawn and the total number of marbles drawn when we repeat the experiment an infinite amount of times. As you have noticed, this process requires repeating the experiment an infinite amount of times, which is, of course, not possible in reality. Also, the computation of probability does require taking a ratio of an infinitely large number to an infinitely large number, but I will not go into mathematical limits here.

In the case of the marbles, if we assume that all marbles have an equal chance of been drawn from the bag, which is a very very important assumption we make and I will examine its implication later. This allows us to analytically calculate the probability using limits, and the probability of getting a red marble is 60% chance, or 6 / 10, the number of red marbles over the total number of marbles. In the real world, we can never actually measure the true probability of anything as it requires us to perform the experiment for an infinite amount of time. But fortunately, by the law of large numbers, proved by mathematician Jacob Bernoulli in 1713, we know that when we have a large enough number of experiments, the empirical probability will converge to the true probability. But unfortunately for us, how large the number of experiments has to be, depends on the true probability, which in most cases we do not know, but in general, the larger the number of samples we have, the closer the result will be to the true probability. Using the marble example, let’s say we draw marbles out of the bag 100 times and saw 50 red marbles, then our empirical probability of red marbles is 50%. If we try the experiment 1000 times, we may see 610 red marbles, our empirical probability of red marbles is then 61%. In an oversimplified statement, the more experiments we conduct, the closer the empirical probability will be to the true probability.

We see, there are actually three different types of probabilities: analytic probability, the probability we calculate base on the assumption of the equal chances of each event, the empirical probability, the probability calculated based on our empirical data, and true probability, that the probability we want but unfortunately not obtainable. The true probability can only be estimated through analytic probabilities or empirical probabilities. When we see a science article or news journals using probabilities, we should first try to figure out what probabilities are they claiming? Empirical or analytical? Then we can figure out how can we verify or reason with those probabilities.

Unless our claim is 100% or 0%, a sample size of 1 is not enough to make any meaningful conclusions about the probability. If I claim I have a coin well get 80% heads and 20% tails, I flip it once, and get a tail, can you conclude that I was lying for certain? No. But it may be tempting to conclude that there is an 80% chance that I was lying, which is another common mistake people make regarding conditional probability, which I will get to in a later episode. But regarding the probability of the win rate of each candidate, unless the two candidates have competed many times under identical circumstances, every election and conditions are unique, there is simply no meaningful way to compare win rates of the candidates or verify the claims regarding their chances of winning. To verify the empirical probability of someone’s claim, we need first to see that they have used a large number of samples. The larger the sample size is, the closer the empirical probability can be to the true probability.

How about analytical probabilities? Analytical probability calculation is based on the assumption that all events have an equal chance of occurrence. Breaking this assumption is often where the sampling bias comes from. If I want to compute the analytical probability of getting a red marble out of the bag, I assume first each marble has an equal chance of getting drawn out of the bag, that is 1 in 10. Then 6 of them are red, which means 6 in 10 chance I will get red out of the bag. But each marble has an equal chance of getting drawn out of the bag is actually an extremely strong assumption. It usually means that I know nothing about the marbles before drawing them out. If, for instance, the blue marbles are slightly bigger than the red marbles, and I have the tendency to draw bigger marbles than the small ones. Then the chance of me drawing would not be 60% as the analytical calculation would suggest. In the real world, we can often make the mistake to think that each event has an equal chance of occurring while they don’t. For instance, one of the common ways to collect political polls is using random phone calls of landlines. It may seem that using a computer, we can give each landline numbers an equal probability of been phoned, but in reality, due to that older population are far more likely to have a landline and answer the phone calls than the younger population, the polling disproportionally represents the older population than the young, and the analytical probability it computed is then, biased and do not equal to the true probability. Similarly, throughout the 20th century, most of the medical trials and tests were done while males averaged 40 years old. A lot of the medical standards such as healthy level of biosignatures, or doses need for treatment are based on the average value of that specific demographic. But women and other races do not always share the same average value among their health groups nor do they respond to drugs exactly the same way as a middle-aged white male. Even though modern medicine is trying to be more inclusive in our data collection, we are still living in the shadow of this bias and much past data is still in need of correction.


Now that we know how to calculate probabilities, how should we interpret probabilities? Let’s look at the following problem. Back to our bag of 10 marbles again, 6 red, 4 blue. If you choose one marble at random, what is the chance that the marble is to be red? We learned in middle school that the answer to the question is 60%. But, this question is what Buddha would call: a question wrongly asked. There are two different interpretations of the phrase “to be”. If the interpretation is: the chance of you taking a random marble from the bag, and that marble is red. Then the chance is 60%. But if the interpretation is the chance of a given marble is red, provided that it is pulled out of the bag? For that, it is either 100% or 0%. You may be confused and think that those two interpretations are the same or at least sound the same. Because a lot of the classes that teach probabilities have phrased things in both ways, actually, the 2nd interpretation is probably more common than the first. But I will show you why they are different, and why it is important to separate them in our daily language.

The first interpretation is the chance of you taking a random marble from the bag, and that marble is red. It is a statement about the population or the sampling space. Specifically the chance of getting a certain result when taking a random sample from that population. But the 2nd interpretation: the chance of the given marble is red provided that it is pulled out of the bag, is a statement about the specific marble that we have picked out of the bag: what is the color of the marble we have just pulled out the bag. The marble is either red or blue, it is not a mix of both until we observed the marble. As Schrodinger has pointed out using the cat in the box example, it is absurd to think a large-scale object to be in a contradicting state until we observed it. The marble is not 60% red 40% blue until we observed it. It is either red or blue. Now consider you have pulled a marble from a bag with 6 red marbles and 4 blue marbles, you say the marble has a 60% chance to be red 40% chance to be blue. Then, without looking at it, you put the marble into another bag, then you opened up the bag, you saw 2 red marbles and 2 blue marbles. Now the chance of the marble being red is 50%? Did something has changed about the marble when we moved it from one bag to the other? No, then why should the chance of what color the marble change?

Well, it shouldn’t, if the marble were red, it would still be red, if it were blue, it would still be blue. It is not that the marble has changed its color when we moved it from one bag to another, but our belief or confidence about it. So interpretation 1 is usually called the frequentist interpretation. Interpretation 2 comes from Bayesian interpretation, but with some important, and incorrect modifications. Bayesian interpretation is about confidence or belief with incomplete information. When I draw a marble from the bag, without looking at the marble, I am 60% confident that it is red, and 40% confident that it is blue. This interpretation describes our subjective beliefs based on the evidence at hand, not objective realities. Interpretation 2 ignored the subjective part of the interpretation, instead of talking about our confidence about the marble, it stated the chance as if it a raw fact about the marble. It is, completely wrong and can lead to disastrous conclusions, which I will illustrate later. Probability is a characteristic of the population, it tells us little to nothing about individual samples from a population. We can use it to form beliefs about the marbles, but not the conclusions about them.

Let’s look at another problem, consider a university of which 60% of registered students graduate in 4 years. You, a student newly accepted in the university, what is your chance of graduating in 4 years? This is exactly the same mathematical problem as the marbles in a bag, but our interpretation of the numbers can be completely different. In the marble problem, because we know that which bags the marble is in does not influence the color of the marble, it is absurd to interpret the probability as the marble has a 60% chance to turn red when we pull it out of the bag. But here, we know that the university we attend has an influence on our graduation, then can we say that we have a 60% chance to graduate in 4 years given that we are attending a university that produces a 60% graduation rate. But again, this conclusion has the same logical error as the marble problem. Say, that you are a hard-working student, then even though you are attending a school that only has a 60% graduation rate, you might still almost guaranteed to graduate in 4 years because you are hard-working. But if you are a hardcore slacker, you most likely gonna drop out within 4 years even though 60% of your class are gonna graduate. Yes, the university can have a huge impact on someone’s study habits and a good professor can go a long way in someone’s education, but the single number here does not tell us how the university impacts someone’s education or even a given attendant’s chance of graduating. The number 60% is a measurement of correlation, the ratio of the number of people graduating in 4 years from the given university and the number of people going to the university. It does not tell us the chance of a given individual graduating from the university nor does it tell us how the university influences the student’s chance of graduation. If we answer the question and say that your chance of graduating is 60%, we are kind of ignoring the factor that all students are different, coming from different socio-economical positions, also have different study habits. We have also mistaken correlation for causation. The 60% is a correlation, but when we say that your chance of graduating is 60%, we are kind of saying that attending the university will make you have a 60% chance of graduating, and how hard you work doesn’t really matter. If it does matter, and we do not know how hard you are working, then we can not know how likely you will graduate in 4 years, even though we know 60% of students graduate from the university in 4 years.

Wrong interpretations of probability can not only let us make the wrong conclusions about our chance of graduating, but it can also lead to unconscious biases. Some people may think I am so far just arguing over semantics when I was talking about the usage of the phrase “to be” and its different interpretations. But I will show you why it is not just semantics, but why it is important to interpret mathematical numbers as close to what they measure as possible. Consider the following question: We know that in the U.S., African Americans on average have less education and earn less than Whites. Then if we have two candidates for a job, one African American, one White, can we say that the African American candidate is likely to be less educated than the White candidate? This question probably will immediately make a lot of people’s faces twitch. But think about this, if we think it is ok to say that there is a 60% chance the marble we choose to be red, why can’t we conclude the African American candidate to be likely less educated than the White? It is applying the same interpretation for probability. A lot of people agree with logic when they look at the numbers because this is how they learned in middle school how to interpret probabilities. While this interpretation is, completely wrong, and can lead to biased conclusions, the person who uses this reasoning and holds those biased beliefs may think they are completely logical because this is how they learned to interpret probabilities in their math class.

Mathematics and the raw number have the tendency to feel objective, because we have learned in school that 1 + 1 = 2 is always true and is not up for debate and interpretations, except that it does. Whenever we translate a mathematical expression into English or use mathematics to describe the physical world, we actually interpret math and give it real-world meanings. And many of those interpretations can not only be wrong, but extremely misguiding. If you are interested, in my previous episodes on scientific methods, I had a deeper discussion about how can we verify if our interpretation of a mathematical model is correct using the scientific method and linguistic agreement test. What language we use to express those mathematical concepts can not only shape what conclusion we draw but can also introduce bias into how we view the world.

In this episode, I want to emphasize the meaning of probabilities. Probability is a measurement of some characteristics of a population. For instance, in the marble example, the probability of 60% is the chance of getting a red marble from the bag. It is a measurement of an outcome by performing an operation on the population. It tells us little if nothing about a specific individual of the population. We should avoid drawing conclusions about individuals in a group based on a probability we know of the group. The 60% chance is the chance of drawing a red marble from the bag. It does not tell us what is the color of a given marble from the bag is. Similarly, if we take a random minority from the population, they are likely to have less education and less income than a random white sampled from the population. But it does not tell us the income or education level of any specific minority. Furthermore, even though certain characteristics are correlated with a population, probabilities alone do not tell us why they are correlated, and we should not make conclusions about causality from correlation. There can be deep socio-political reasons why wealth and education are distributed the way it is now and data from present time can’t tell us how we got here. We will have to look to the past for answers to those questions.

On a side note, I am not very comfortable with use of the word “causality”. The concept of causality often implies that in a chain of events, if one event in the chain has not happened, we would not be in our current situation. Causality is often talked about in the context of assigning blames. If someone gets into a car accident, we would attribute the cause to his driving or disobedience of the law instead of the invention of the automobiles, even though without cars, the car accident would not have happened. But in the real world, many events are influenced by many factors, some are human choices and some are just random chances. For instance, someone’s quality of education can be influenced by what family they were born in, what teachers they encounter throughout their life, and their own personal efforts. A person who is born in a poor country can work extremely hard but may never reach the same level of education as someone who is from a developed country but pays little attention to school. We like to think all events in life, good or bad are results of personal choices, but that simplistic model does not reflect how the world works. In science, I prefer to think in terms of influence rather than causes. Instead of saying that “science has found that A causes B.”, it probably better to phrase it as “science has found that A has an influence on B.” Yes, it would not be as headline worthy, but it avoids the inherent blame assigning subjective nature of the word “cause”, and acknowledges the complexity of the interwoven parts of the world.

In the next episode, I will dive into conditional probability, and p-value, the bread and butter of modern scientists, and why as Mark Twain as said, there are three types lies, lies, damned lies and statistics.

Philosophy of Language: Relationship, and Numbers

In the last episode, I discussed one way in which words gain their meaning: definition by reference. When a community agreed to use the same word to refer to the entities with some shared similarities, the word became the reference to those entities. Words like “cat”, “dog”, “cloud”, and “rainbow” are some of the examples of words defined by reference. But definition by reference can only apply to things that we can use our finger to point to. Many words simply can not be defined in this way. For example, how can we use references to define the word “father”? We can point to pictures of men, but how can the learner know that we meant “father” but not just “men”? One way to clarify this is to use pictures with an older adult male and a younger child and point to the older male in the pictures. Let’s leave aside the question that whether this method is sufficient to teach someone the concept of “father”. But this does illustrate something interesting. Even though, we only point to the older adult male as the reference to the word “father”, the presence of the younger child is crucial in defining the concept of “father”. Without the younger child, with only the older adult male by himself, we can not differentiate the concept of “father” from the concept of “man”. The concept of father does not refer to specific people or entity, but a member of a group in which different members have specific relationships with the other members. In this case, a father is a member of a family group that is bonded by a father-child relationship. The child is the other member of the family group. The existence of a father predicates the existence of a child. Similarly, the existence of a child predicates the existence of a father. That is when we are talking about a father, we are implying the existence of a child to that father, even though we do not explicitly talk about that child. We, humans, are constantly making those hidden assumptions when using words that are defined by relationships both consciously and unconsciously. When we use the word poor we imply there is a state of abundance that we hope for. When we use the word injustice, we imply that there is a potential state of fairness and equality. This “baggage” human language carries, makes our communication not only dependent on what is said but also what is not said in context and subtext.

So, different from the concepts that are defined by reference, concepts that are defined by relationships are only valid within the group that it is defined in. For example, the concept of cat is defined by reference. Therefore, no matter in what context I am speaking, I can always refer to the furry animal with pointy ears as a cat. But It is not true for concepts that are defined by relationships. This was very well reflected in one of Plato’s dialogues, Euthydemus, which contrasted the difference between rhetoric and philosophy, Sophism and Socratic dialogue. In the dialogue, The Sophist named Dionysodorus asked if Ctesippus has a dog, and Ctesippus answered, “yes, a real rogue”. Then Dionysodorus asked: “Has he got puppies?”. “Yes”, answered Ctesippus, “they are rogues just like him”. “So the dog is a father then?” asked Dionysodorus. “Yes”, answered Ctesippus. “And the dog is yours?” asked Dionysodorus. “Certainly”, Ctesippus answered. “Thus he is a father, and yours, and accordingly the dog turns out to be your father, and you a brother of whelps.”, concluded Dionysodorus.

Philosophers often get a bad reputation for making ridiculous statements. There is another famous statement called the Dichotomy paradox proposed by Zeno of Elea, you probably heard it before. Suppose you want to cross a street. To get to the other side, you have to first walk halfway across the street but to get to halfway across the street, you have to first walk a quarter way across, but to walk a quarter way across, you have to walk 1/8 of the street and so on. In this way, we can divide walking across the street into infinitely many small tasks, and each takes a finite amount of time, but adding infinitely many of the small tasks together would be an infinite amount of time to cross the street, therefore, you can never reach the end of the street. The story was told that Diogenes the Cynic, once heard of the paradox, stood up, and walked across the street to demonstrate the falsity of Zeno’s conclusion. But Diogenes has missed the point. Zeno certainly knew that he can cross the street before the end of the universe, but what he was curious about, was which step of reasoning he was mistaken that has lead to the false conclusion. That turns out to be a far more interesting question. It was after 2000 years, Newton has demonstrated the sum of some infinite series of numbers can be finite using calculus. I said demonstrated, not proved, because even though Newton’s calculus does the correct computation, a lot of Newton’s statements regarding its validity was based on self-evident arguments. It was another century until Bernard Bolzano and Karl Weierstrass provided the formal proof.

When we examine a statement, we should not only check if the conclusion is right or wrong but also get curious about why. If we think the conclusion is right, we should be able to provide logical arguments and evidence to support it. If we think the conclusion is wrong, we should be able to point out to our interlocutors where they have made the logical error. To examine that we have to think from their perspective and reason based on their beliefs and premises. As Bertrand Russell has pointed out, if someone’s conclusion is incompatible with our own beliefs, we can only say that subjectively, we do not like it. Only when we have found logical contradictions within someone’s belief system, we may objectively conclude, that they are wrong. This process demands us to try to understand from other’s perspectives, it also makes us examine our own arguments to avoid making the same logical errors in our reasoning. In Plato’s dialogue, Euthydemus, it was obvious that Ctesippus’s dog was not his father, but what is far more interesting, is where the error of logic occurred. This comes back to how the concept of “father” is defined. Ctesippus’s dog is Ctesippus’s. And the dog is a father, but only in relation to its puppies. Unlike the word cat, that a cat in any context is a cat, an entity in some context is a father, but in others is not. When we talking about the dog’s relationship with the puppies, it is the father. But when we are talking about the dog’s relationship with Ctesippus, then the dog is not the father. This is the mistake Sophist Dionysodorus has made.

Languages carry baggage. There are the implicit meanings we may or may not intend to express when we speak sentences. For example, when I say, “This is the father.” I imply there exists at least one entity that is the child of the “father” whom I was referring to in the sentence. Here is another example, if I say, “I got a new computer”, most people will interpret that I had an old computer which I am replacing with the new one, even though I did not explicitly say I had an old computer in my sentence. In the two examples above, it may seem obvious what baggage those sentences carry. But in a more abstract context especially with words without a good definition, it can become hard to tell what is the baggage that the sentence is carrying. For example, when I say “This is a good choice”. What does the word “good” implies? Does it mean that the choice is aligned with a standard that we consider as “good”? If so, what makes the standard “good”? And does it exist a bad standard that is in opposition to “good”? How can we define good? I know that for many people if they were asked about what is good, they probably will use some words that are synonyms to good to define good such as “desirable”, or “right”, but those definitions only push the problem down more, what does “desirable” or “right” mean? I will not dive too deep into the philosophy of ethics here, as I am discussing the philosophy of language, but I hope I have demonstrated that when we use a word as simple as “good”, we might be unconsciously implying a lot of contexts without even realizing it. I do want to invite you to think about how are abstract concepts such as “good” are defined: are they defined by references or relationships? Or something else? Can such concepts pass the linguistic agreement tests? Do humans agree on what those concepts mean? If so how can we know that we agree?

Now back to the philosophy of language. I have introduced the idea of definition by relationships, allow me to make the attempt to define numbers here. Numbers are quite fascinating, in countries with compulsory educations, almost all members of the society know how to add or multiply numbers. If you think carefully, you will realize that for most of us, we only learned in school how to do operations with numbers, but never been taught the definitions of numbers. Some may say that numbers are 1, 2, 3, 4, etc. But those are examples of numbers, not the definition of numbers. Similarly, if you ask me what is a chair, by pointing to a chair, I have only given you an example of a chair, not what a chair is. A definition for a concept may not always exist, as we have discussed in the last episode, there are many concepts we simply do not have a logically consistent definition and as Wittgenstein has pointed out, the meaning of words changes constantly when used differently by different groups. But when a definition for a concept does exist, it should be able to tell us, unambiguously, if an entity we examine fits the definition, therefore is an example of the concept, or not. Given that it seems so obvious that we can unambiguously tell if something is a number or not, many philosophers believed that definition for numbers does exist and made many attempts to find it.

In ancient Greece, when algebra has not yet been invented and all mathematical proofs are done geometrically instead of algebraically, Plato thinks of geometric shapes, such as perfect squares, perfect circles, and perfect triangles, as real objects in an ideal world and all imperfect squares, circles, and triangles in our world are mere shadows of the shapes in the higher reality. This is quite similar to how Plato understands other concepts. He also believes the perfect version of a knife or a cat, existing in a higher reality, and when we see a knife or cat, we are comparing what we see to the perfect version and determines if what we see is a knife or a cat. This kind of makes some sense for things that we can see and touch, but it raises another question: what is the Platonic ideal of the number one in the higher reality when we can use it both for one cup of water and one meter? What is the commonality shared between one cup of water and one meter?

Philosophers such as John Stuart Mill argued that numbers are the property of things, like color or hardness. For example, if I see 3 apples, the 3 is the observed factual property of the group of apples. Bertrand Russell countered this definition by giving the following example: taking a pair of shoes, by how we are counting them, if we count individual shoes, then there are two of them, but if we count pairs of shoes, there is only one pair. Yes, at different times and different lighting conditions, the cloud can show different colors. But under the same condition, objects should always display the same color. But for a pair of shoes, under the same condition, we can think of it as one or two simply based on how we are counting them.

Gottlob Frege addressed this issue by introducing the theory of types. Frege defines numbers as extensions of concepts. He said: ‘The number of F’s’ is defined as the extension of the concept G, such that G is a concept that is equinumerous to F‘. But what does that even mean? Indeed, Frege used words such as “extension”, “concept “, and “equinumerous“, and we will have to know what they mean before we can understand his definition of number.

This leads to a very interesting problem. In “Introduction to Mathematical Philosophy”, Bertrand Russell has pointed out, correctly, that if all words are defined by other words as the dictionaries do, then we unavoidably will end up either with infinite regress or cyclic definitions. So we have to be able to give some words or concepts meaning without using other words. We can only achieve this, through definition by reference. Consider the definition of a cat, how can we define the concept of a cat to someone who can not see, hear, or touch? We can describe how it looks, but they won’t be able to understand it because they can’t perceive sight. We can describe how it sounds, but they also won’t understand it as they can’t perceive sounds. It is not just the words that describe the sound of a cat that won’t make sense to them, any words about any sound won’t make sense to someone who can not hear. There is no way we can communicate the idea of a cat based on our perception of the cat to this person. What is a cat to us is just the sum of our sensory experiences when we interact with a cat. How it looks, sounds, smells, feels. When we think our sensory experiences are close enough to the reference we used when we first learned the concept of a cat, we would conclude that it is a cat that we are interacting with.

In psychology, our subjective experience when perceiving a physicals phenomenon is called Qualia. When we use the definition by reference for words, we are putting a label, in this case, the word “cat”, to a collection of certain sensations, or Qualia that we have when interacting with those references, or in this case, cats. When defined in this way, the concept of cats is not defined by any words, but by my sensations, I know a cat when I feel that it is a cat. Many philosophers have argued that we can never use words to communicate Qualia, it has to be understood through the first-person experience. For someone who is blind, no words can be uses to make them understand the concept of a rainbow. Certain knowledge has to be gained through interacting with the physical world using our subjective senses not from reading books or learning from others, and those concepts can only be defined through a reference.

This is what I think the fundamental limit of machine learning. It is not that computers are not as capable of doing computation as humans. For that, we have many many examples of the superiority of the computer’s power to reason over humans. For instance, the computer Deep Blue defeated world champions in chess in 1996, and modern computers have become literally hundreds of thousands of times faster in comparison. What is the fundamental limitation of machine learning is that machines do not share the same sensory experience as humans, therefore they lack the Qualia to understand the world as we do. To think how difficult it is, for a person in a privileged position to empathize with someone who is in poverty, and those are two humans who share the same sensory organs but just have different experiences. How can a machine, that does not sense the world as we do, or does not feel as we do can see the world the way we see it? This, of course, does not mean machines can not be intelligent or understand the world, it just means that they will not understand the world in the same way as we do.

Now, let’s get back to the definition of numbers. Allow me first to give 6 concepts without providing any word definition. I will demonstrate later how we can use references to put those concepts into context and give them a concrete meaning. We call concepts without word definition, primitive concepts. The 6 primitive concepts of arithmetic are “1”, “the set of numbers”, “equals to”, “is in”, “if…then…”, “add”. Those concepts are defined by the following relationships:

  1. 1 is in the set of numbers.
  2. If x is in the set of numbers, then x add 1 is in the set of numbers
  3. If A add 1 equals B add 1, then A equals B.

These three relations should be fairly simple to accept by anyone who is familiar with arithmetics. In plain English, the first rule just says 1 is a number, the 2nd rule says if x is a number x+1 is also a number, and the third rule says that if A + 1 equals to B + 1, then we know A = B. Note those rules above only declare the relationship between the 6 concepts: “1”, “the set of numbers”, “equals to”, “is in”, “if…then…”, and “add” and they are composed only of those 6 concepts. But those relationships do not tell us what those concepts mean in our physical world. But I will get there. With those relationships defined we can extend our definitions to more concepts, for instance: 2 is 1 add 1. Then we can define 3 as 2 add 1, and so on for 4, and 5, and all the other numbers.

Let’s say we have 1 orange. We can say it is a pile of 1 orange, though it is not much of a pile. So we toss in another orange. Now it is a pile of 2 oranges. We have learned in elementary school that we can use arithmetic to calculate the number of oranges in a pile. For example, if I have a pile of 4 oranges, and a pile of 5 oranges, when we combine them, without counting them, we know we will get a pile of 9 oranges. But here is an intriguing question, how can we know for sure there are 9 oranges in the combined pile without counting them, but only from a calculation? This is the central question surrounding the relationship between mathematics and the physical world. How is it possible that we can know something about the real world only based on my prior knowledge and some seemingly made-up mathematical rules, without actually examining the real world? How can we get the number of oranges of the combined piles, without actually counting it, just from our knowledge of the two original piles and the rules of arithmetics? Furthermore, if we never counted the pile of combined oranges, how can we be sure that there are 9 oranges in it? Why should it be 9 oranges in the piles? The oranges don’t crunch the numbers when we combining them together and decided to be 9 so they will obey the laws of arithmetics. Oranges don’t know about the law of arithmetics, they just stack together. It turns out to be a rather complex question, that why arithmetics can be used to model and predict the number of orange in a pile.

We learned in school that we can use arithmetic to get the number of oranges of the combined pile because it is self-evident or just intuition. I would agree that the result is 9 oranges is intuitive to me as well, but that does not answer why 4 + 5 = 9 would give us the correct number of oranges without counting the combined pile explicitly. Furthermore, I have watched kids learned arithmetic, and it was neither intuitive nor simple for them when first exposed to those concepts. Only after hundreds and thousands of practice problems, they start to gaining intuition about arithmetic, this is also true for other knowledge. So intuition is much based on how familiar we are to a concept, and once we are familiar with it, it is hard for us to remember what it feels like to be foreign to those concepts when we first started learning them. Furthermore, for many people, general relativity or quantum mechanics are absurdly bizarre and completely counter-intuitive, but those mathematical equations can provide extremely accurate predictions about the physical world.

The key to understand the relationship between mathematics and the physical world is how mathematical concepts are defined. As given above, arithmetic can be defined by 6 primitive concepts: “1”, “the set of numbers”, “equal to”, “is in”, “if…then…”, “add”; and three relationships:

  1. 1 is in the set of numbers.
  2. If x is in the set of numbers, then x add 1 is in the set of numbers
  3. If A add 1 equals B add 1, then A equals B.

Those relationships tell us nothing about what those concepts are in the physical world, because, in a different context, they mean different things. Take our piles of orange for example.

For relationship 1: We use the symbol 1 to refer to a single orange. The concept of “the set of numbers” refers to all the possible piles of oranges. The concept of “is in”, as “A is in B”, means “A is one of the possibilities of B”. Then the first relationship: 1 is in the set of numbers, in this context, means that a single orange is considered as a pile of oranges out of all of the possible piles of oranges.

For relationship 2: We use the concept of “if A then B” to indicate a logical connection between two situations, that is when we observed situation A, then we know B must be true even if we have not directly observed it. So the relationship: If x is in the set of numbers, then x add 1 is in the set of numbers, it means that when we have a pile of oranges, after putting in another orange, it is still a pile of oranges. The concept of “add” here refers to the action of putting additional oranges into an existing pile of oranges or combining two orange piles together.

For relationship 3: We have two piles of oranges, we put one additional orange to each pile. If after the action, two piles have the equal amount of oranges, even though we have not counted before adding the additional oranges, we can still conclude that the two piles have the equal amount of oranges before the additional ones put into the piles. One way to define “equal” is to use the following action: if we take one orange from pile A then pair it with one orange from pile B, put them away, repeat the action, until either pile A or pile B has 1 orange left. If both of them have 1 orange left, pile A and pile B are equal. If one of them has 1 orange left, and the other one does not, they are not equal. I used this pairing action as the reference to “equal” here instead of counting because counting oranges requires the use of general numbers, such as 2, 3, 4, etc., which are not part of the 6 concepts and 3 relationships given above. So we have not yet defined general numbers. Therefore, we can’t use them to define the concept of “equal”.

In the example above, I have contextualized the 6 primitive concepts of arithmetic in the scenario of orange piles. In different situations, whether counting cups of water or the number of shoes, those primitive concepts will be contextualized by different physical references. It might be bluntly obvious how to count oranges, and it may seem that I am making things overly complicated for something as simple as 2 + 2 = 4. But I will show you why it is important to define mathematics as a set of primitive concepts and relationships that is detached from the physical world and those primitive concepts and relationships should only be contextualized in each application separately. There is an idea called indexical in the philosophy of language. An indexical word is a word that refers to different things in different contexts. For example, the word “here”. “Here” for me, at the moment is the apartment I am living in. But “here” for you, is where you are at the moment. Similarly, numbers are indexical. There is no inherent 2ness of the shoes when we say there are 2 shoes on the floor. We say there are 2 shoes on the floor because we have chosen implicitly to use a single shoe as the reference when we count shoes in this context. If we have chosen to use a pair of shoes as our reference of 1. Then there is only 1 pair of shoes. The meaning of 1 and 2 changes based on what context we choose, or in this specific case, what unit we choose.

Let’s examine the question: why is 2 + 2 = 4? In elementary school, we learned that 2 + 2 equals 4 because if we add 2 cups of water to 2 cups of water, we will get 4 cups of water. Similarly, 2 bags of candies and 2 bags of candies makes 4 bags. The arithmetic rules are simply a reflection of how the physical world works. But what about adding 2 cups of water to 2 cups of pentanol? We will get 3.94 cups of the mixture, not 4 cups. Does that mean the arithmetic law should also include 2 + 2 = 3.94? Another example is the speed of light, we know that when measuring the speed of a human on a moving train, we will get the sum of the speed of the human’s running and the speed train as the result. But if we measure the speed of light on a moving train, we will still get the constant of the speed of light. Does that mean we should conclude that any number add 299 792 458, which is the speed of light in m/s, will give us 299 792 458? Of course not.

2 + 2 = 4 is not because adding 2 cups of water to 2 cups of water will get 4 cups of water, nor is it because 2 bags of candies and 2 bags of candies makes 4 bags. We can not use physical phenomenons as the reason why 2 + 2 = 4, because there are some physical phenomenons that match the arithmetical rules, as far as our experience has shown, and some simply don’t obey the arithmetical rules. I will not go into details of the proof here, 2 + 2 = 4 can be derived from the set of 6 primitive concepts and 3 relationships above, and no physical experience required. The reason that we can use 2 + 2 = 4 to calculate the result of adding cups of water together, is that just like the example of piles of oranges above, we can put those primitive concepts into context using definition by reference and test that those references of the 6 primitive concepts in the physical world satisfy the 3 relationships using the scientific method. The scientific method has verified that arithmetic is a suitable model to use to describe and predict the physical phenomenon of adding waters or combining oranges. For the water example, we can define 1 as a single cup of water, the set of all numbers as all the possible volumes of water by combining single cups of water, so 1 cup of water, 2 cups of water, 3 cups of water, and so on. Add can be defined as pouring two volumes of water together. By using these definitions, we can design experiments to test that pouring cups of water together satisfies the 3 relationships listed above, so we can use arithmetic to calculate the results of combining waters. But when combining water with pentanol, the second relationship is broken. If we define 1 as a single cup of liquid, either water or pentanol, adding two cups of liquid together will not result in a valid cup of liquid, when one of the cups is water and the other is pentanol. Remember, we define the set of all numbers as, 1, 2, 3, 4, etc., full number cups of liquid. Therefore arithmetic is not an accurate model for describing the phenomenon of mixing water and pentanol.

Science is the bridge between the physical world and mathematics. We use the scientific method to test if a given physical phenomenon shares the same relationships as the mathematical concepts we have defined, and which linguistic model can be best used to describe and predict each given phenomenon. We know that we can use arithmetic rules to calculate the number of oranges in a pile because we have used the scientific method again and again and verified that arithmetic rule has always been a good model to predict the number of oranges when we combining piles of oranges. While for other phenomenons such as mixing liquids or combining the speed of fast-moving objects in different frames of reference, arithmetic rules are only an approximation.

Even though mathematics is an intellectual construct defined in a vacuum, that is entirely separate from the physical world by design. But that does not mean that it is completely made up. As we have seen above, many physical phenomenons, from adding oranges to a pile to the light traveling through space, share the same relationships as our mathematical concepts. To be honest, we do not yet know why, objects in the physical world can, and do share the same relationships with pure intellectual constructs. This is one of the greatest mysteries of this fascinating world we live in. Mathematics is just like other languages, a way to help us understand the physical world and to communicate our understanding to others.

Thank you for listening to the podcast! If you like the podcast, please share it with your friends! I love asking difficult and interesting questions and I hope that you do as well.

Philosophy of Language: Senses and References

Have you ever thought about how can we know that the red color in my mind is the same red color that is in your mind when we see the same image?

Early 20th-century Austrian-British philosopher Ludwig Wittgenstein has proposed the following thought experiment: Consider a group of people in a room, each of them holds a box. Only the person who holds the box can see what is inside their own box, while no one can peek into other peoples’ boxes. All of the people refer to what is inside their box as a “beetle”. When I say something is a “beetle”, it is me comparing what I see around the room to the “beetle” in my box, and when you say something is “beetle”, it is you comparing what you see around the room to the “beetle” in your box. But how can I know what is in my box is the same as what is in your box? Wittgenstein said it is unknowable. Language is private, and we can never perfectly communicate our subjective experience, only provide some analogies using language. I can not show you what is exactly inside my box, but I can point to your things in the world that I think is similar to what is inside my box. I can not make you feel what I feel, but I can describe what has happened to me, or when you see the laughs or tears of mine, you would use the time you laughed or cried as a proxy for understanding my feeling at the moment.

Empathy is to try to imagine how others have felt by putting ourselves in their situation. But it is difficult to empathize with others when we lack subjective experience. I can read about the people living in poverty, their living conditions, but is reading about it sufficient for me to feel the struggle and hardship they are experiencing? There are an estimated 2.5 billion people lack access to improved sanitation. This is more than 35% of the world’s population. But it is difficult for me to empathize with them because as long as I can remember, I have always been able to turn on the faucet and clean water would come out of it. But I do not think our lack of subjective experience is an excuse for not trying to understand. I may never fully know what is the beetle that is inside your box, but with efforts, I can get a better and better picture of it. Even though it is not perfect, language can help to make my thoughts and feelings as transparent to you as possible, to bridge the gap between our understandings, and to make us feel less alone in a world in which we can never show others our own subjective experiences.

If we can never peek into other people’s subjective experience, how do we ever manage to understand each other at all? Let’s get back to the example of the colors. How can I communicate the color red to you and know that you agree with me in the meaning of the word “red”? Consider the following experiment, you sit in front of a monitor, it will show you one pure color patch at a time, if you consider the color shown as red, you will answer yes, otherwise you will answer no. There is another participant, at the same time, is given the same color patch and asked the same question. In this experiment, if for every color patch, you and the other participant’s answers always match, then we can conclude you two agree on what is red. This test is completely behavioral, that is based on what is observable, i.e. our answers, it does not matter how I know the color is red, or how the color red makes me feel, as long as the participants give the same answers, they agree on the meaning of the word. Then when you ask the other participant to pick out a red object from a pile, they will pick the object with the color that you consider as red because the two of you agree on the concept of red.

Now consider the following thought experiment: Imagine there is a girl name Denise with an extremely rare condition, in her mind the color red and green are switched. That is, when she sees an object of the color red, she would see what you consider as green in her mind, and when she sees an object of the color green, she would see what you consider as red. But she had this condition ever since she was a baby before she learned the language. When her parents taught her color, they pointed to her what they thought as red objects and taught her the concept of red. Now she has associated the word “red” with her perception of green. When you ask her what the color is of a red object, even though she perceived it as green in her mind, but because she has associated the sense of green with the word “red” from the very beginning when she started to learn the language, she will still answer you that the object is “red”. And there is no way for us to figure out she sees the color red as we see the color green because whenever we ask her what the color is of a given object, she always gives us the word that we expect.

There is a common conception that linguistic understanding is an activity of the mind that is unique to humans. But in fact, linguistic understanding is a measurable behavior. In a way, we can view language as a measurement we perform based on our sensory experience. When I see the color red and say “It is red”, I am converting my visual perception to a string of symbols. And this is not unique to humans. In research conducted by Doctor Ramirez, dogs were trained to count, even though they don’t write or speak the human language. What the researchers did was that they have a white tray in which they will put a number of different small objects. Then several whiteboards were presented to the dogs, each one with a different number of dots. The dogs were trained to pick out the whiteboard with the same number of dots as the number of small objects in the tray. The trained dogs can consistently select the correct whiteboard based on the number of objects in the tray, that is, they understand the concepts of numbers, even though they do not know the symbols we use for numbers. Similarly, chimpanzees can be trained to use sign language for different objects. When asked what object they saw, they can consistently associate the same sign with the same objects. So they also have linguistic understandings.

Now, in the color experiment above, I have said that if you and the other participant always agree on if a given color patch is red or not, then we may conclude that you two agree on the concept of red. But you have not seen the other participant in person, what if the other percipient instead of being a human is actually an artificial intelligence, do you two still agrees on the concept of red? Does the artificial intelligence understand what you mean by the word “red”? I think it does. Even if a human perceives red as green and green as red, as long as they pass the linguistic agreement test, then you two agree on the concept, and there is no way for us to know if the colors in other people’s minds are scrambled. Similarly, it doesn’t matter how machines perceive color or process words, as long as they pass the linguist agreement tests, they understand the linguistic concept just as we do.

It is a common conception that understanding and intelligence are uniquely human traits. To have intelligence is to think like a human, whatever that means. We know that computers can do multiplications, but we don’t think that computers understand the intuition of multiplication, because we often associate understanding with the Eureka moment and the feeling of something in our minds suddenly clicked. But that is a mistake. Sudden epiphany has very little to do with real understanding, and the feeling of understanding something is not the same as actually understanding it. There are plenty of times in college when I thought I understood some concepts, only to find out later when I made horrible mistakes in tests. How do we know we understand something then? Well, we need to pass the linguist agreement test on that subject. How do we know a student knows how to do multiplication? We throw a bunch of multiplication questions at them and see if they would get the same answers as ours. If they did, they understood multiplications. If when a human passed the test, we consider that they understand the concept of multiplication, we should also fairly consider a computer that can pass the same test also understands the concept of multiplication. If we think computers can not understand multiplication even if they have passed the same test we used for humans, just because they are not humans, it is a bit prejudiced, isn’t it?

Now if we define understanding a concept as passing the linguistic agreement test, then the only concepts that can be understood are the ones that can be defined without logical contradiction. Take the concept of multiplication as an example. We can test someone’s understanding of multiplication using a linguistic agreement test. Because when given two numbers there is an unambiguously correct product of the two numbers, then anyone, whether a human or a machine that really understands multiplication, should give that answer. If someone tells me that 3 times 3 is equal to 7 in a decimal system, I can say that they do not understand multiplication, even if they feel that they do. So, how about non-mathematical concepts, such as freedom or justice. Given a policy, we all probably have some opinion about whether it hinders or promotes freedom. We can use this as the linguistic agreement test for the concept of freedom. Each participant will be given a policy and its intended goal, and answer if they consider the given policy is good for freedom or bad. For anyone who ever had a debate on politics, they would understand there is no chance freedom can pass this test, at least not in our current society. This is largely because freedom means different things to different people at different times. To revisit the example I have used in the last episode, if we consider freedom as not to be owned by other human beings, then the emancipation of the slaves is good for freedom. But if we define freedom as a market free of government regulation, then emancipation is infringing on the free market. The problem is, to many people, freedom means both. As the two definitions are conflicting, for any policy, we can just pick and choose which definition we want to apply, and get different conclusions. This makes the answer to the question completely arbitrary and subjective. In such a case, I am not sure what it means to understand an inherently inconsistent concept.


So, how do words gain their meaning? The English word chicken in Spanish is pollo, or in Chinese is 鸡. The same concepts can be associated with different symbols in different languages, and there is no law of nature to dictate that the 7 letters chicken have to mean the feathery beasts that are the descendants of the dinosaurs. How did the symbols get the meaning they are now associated with?

Wittgenstein said: word is use. When a word is being used to refer to a certain object, and other people in the same community have also started to use the same word to refer to the same object, the word gained its meaning as a reference to that object in its linguistic community. This process can be called definition by reference. It is how we learn most of the words. We learned the concepts of cats by given many many different references of cats, pictures, videos, and drawings. Many philosophers have debated whether universal concepts exist. Of course, whoever has played with a cat would agree a reference of a specific cat does exist, as they touched the cat, felt its fur, and maybe even being scratched by it. What the philosophers have debated is that if there exists a perfect cat, that is the Plutonic idea of all cats. Some philosophers believed that when we are determining something is a cat or not, we are, in our mind comparing the thing we see and touch to the Plutonic idea of cats, and if what we see and touch feels close enough to the perfect cat in our mind, then we say it is a cat, otherwise, we say it is not. For each of the concepts, such as cats, dogs, multiplications, or freedom, is there a Plutonic idea of the concept that we can all agree on? This turns out to be a far trickier question.

One of the earliest recorded dialogue of this problem is called the Sorites Paradox or the paradox of the heap, attributed to Eubulides of Miletus. Consider a heap of 1 million sand grains, we remove a single grain, it is still visibly a heap. Then if we remove a single grain again, it is still visibly a heap. When we continually removing the grains, one by one, until there is only one grain left, is it still a heap? If not, when exactly does it stop being a heap? Where that line lies is different from people to people and deeply dependent on what references we used to learn such concepts. If when learning the concept of the heap, larger piles of grains were used as a reference, the person would more likely draw the line earlier than someone who learned the concept of the heap with smaller piles. When defined by reference, the meanings of words become subjective to each person’s experience as each of us has a different set of examples we use to learn each word.

Aotaki Blue Green    MMP-113
The Color Turquoise

Here is another example, the color turquoise, if you are interested to look it up, the hex number of the color is #0D98BA. The color is between blue and green. But if you ask someone to put it into either the category of blue or the category of green, you will find something interesting. People who grow up in the western culture are more likely to categorize it as blue, while someone who is from the eastern culture is more likely to categorize it as green. That is, even though in both cultures, there are concepts of both blue and green, the ranges of the colors are different from culture to culture. Furthermore, what we consider as essential to a concept may also vary. For example, to some people, one of the essential properties of cats is their fur, and they would have a difficult time accepting that a hairless cat is a real cat. Here is another example from my personal experience, where I was growing up, all the popcorns served are coated with sugar or honey. So for me, one of the essential properties of popcorns is its sweetness, and I was and still is having a hard time accepting the salty snack served in the American theatres are popcorns, even though, you know, they are popped corns.

This phenomenon has illustrated the difference between, what Wittgenstein called, references and senses. References are the physical things we referred to in our language, but senses are what the physical objects make us feel, from our sensory experience of petting a kitten to our emotional response of having a warm meal after a long and exhausting day. Different people, when looking at the same references may have completely different senses. One obvious example is two people looking at the same rainbow, one of them is color blind. Even though they are seeing the same object, their senses are different as they would see different colors. But it is not just the different sensory responses that may result in different senses, our personal experience can also lead to different senses. If when you were growing up, every year you went to your grandma’s for Christmas, and she would bake fresh gingerbread cookies for you. When you share your gingerbread cookie with your friend who has never celebrated Christmas before, even though you two are sharing the same cookie, what the cookie means to each of you would be completely different for you two. It might bring you back to the time when you were a kid and running around in the living room next to the Christmas tree, but to your friend, it is just a weird flavored cookie.

In our daily conversation with friends and family, we rarely try only to communicate raw facts, what we usually want, is to share what we feel and think to those who are close to us. But our feelings are just like the beetles in the box, I can tell my friend that I am happy, angry, or sad in words, but there is no way for me to directly share to them what I am feeling at this exact moment, the best I can do is to share the events that have made feel how I feel, and hope that with empathy, my friend would feel the same as I did in the same circumstances. That’s why we share our gingerbread cookies with our friends, go watch movies together, we hope that with the shared experiences, we also would share senses. And it works more or less for people who grow up in a similar culture. If it is the cultural norm to spending holidays at grandma’s eating gingerbread cookies, sharing the gingerbread cookie with your friend is likely to remind both of you of the time spent with your families. But such expectation can also become an empathy barrier for those coming from a different culture, if the gingerbread cookie reminded you and all of your friends of the good times with family, when someone from a different culture dislikes gingerbread cookies, it could be easy to identify them as weird and other, if we hold the belief, either implicitly or explicitly, that all should feel the same in response to the same experience.

Even though we can not directly share our subjective senses, we can still use expressions, words, and body language to illustrate them. Others can not directly feel the joy I feel. To me, the reference to the word joy is how I felt internally, but to others, it is what they see and hear: the smile on my face, the excitement in my voice, and my dancing feet. And this differs from culture to culture, in a more conservatives culture, a smile may be all that people show when they are ecstatic, while for a more outgoing culture, it may be acceptable if not common to go around and hug everyone when they are happy. There are different cultural expectations about how to show that you are happy, even though the two people may be feeling the same amount of happiness, the perceivers may read them differently based on their cultural expectations. If I expect others to show excitement by jumping up and down, I might jump to the wrong conclusion about how happy someone feels when they are coming from a more conservatives culture. One of the great wonders of humans is our ability to use language. We have invented ways to share our thoughts and feelings that others can not directly observe. But we should also be careful not to jump to conclusions about other people’s feelings and intentions, because the only one who know how they really feel, is the person who is experiencing it.

This definition by example is pretty useful when words are references to a physical object that we can point to using our fingers. But it comes up a bit short for words that are defined by relationships. Take the word “father” for instance, to learn the concept of the father, we can show a picture of two persons an older male and a younger person. The concept of “father” is not defined in isolation but as a relationship pair. Two-person can be labeled one as a “father” and the other as a “child” if the relationship between the two resemble what we consider as a father-child dynamic. So the word “father” does not refer to a specific entity, but the label we give the one side of a relationship, and “father” and “child” have to exist together, there is no father without a child and no child without a father.

In the next episode, I will continue the discussion of the definition by relationship, and I will try to answer the question: is mathematics invented or discovered? I will also answer a question that has troubled philosophers since Plato: what are exactly numbers. Stay tuned!

Before I end this episode I want to challenge the listener to take some time to think about the words we use. What does it mean to be fair, just, or equal? And to not only try to make sense to ourselves, but also try to understand what do those words mean to others, and keep in mind, the same words can mean different things to different people, and language is here to help us to make our thoughts and ideas transparent to each other. So let us try to understand others, even if we may not agree with them.

Scientific Method: 3. Falsifiability

Mathematics often gets a bad reputation among students. It is often perceived as abstract, confusing, useless, and completely made up. Those perceptions are not all true. Math does not just deal with obscured numbers and equations. It is a language for describing relationships between, well, everything. The fact is, every sentence we speak or think of, is a description of the relationship between different entities. Take the following sentence as an example: “I ate a peanut butter sandwich for lunch yesterday.” The sentence, at first glance, may seem to be a trivial description of an unremarkable event. But it is in fact, packed with information. It describes the relationship between the sandwich and me, that is, I have eaten it. It contained the relationship between the event and the time at which it has occurred: “at lunch”, and “yesterday”. It also informed us of the relationship between the sandwich and its constitution: it is made of peanut butter. If, I said: “I eat a peanut butter sandwich for lunch yesterday.” or “I ate a peanut butter sandwiches for lunch yesterday.”, a native speaker would immediately recognize my grammar errors, specifically, the present tense verb does not agree with the past time, and the plural noun does not agree with the singular article. Even though we don’t usually think about it when we speak, our daily language does have a set of rules that the composition of words should follow, to eliminate ambiguity when we communicate, we know those rules as grammar. If I were to break the rules, say, “I ate a peanut butter sandwiches for lunch yesterday.” The communication became unclear, as the listener can not tell how many sandwiches I have consumed simply from the sentence alone. Good communication requires less contradictions in the language we use. Grammar facilitates this need to a certain extent. But for anyone who has ever tried to communicate before, what people think we meant, often are not what we want to express. In the last episode, I have illustrated how the word “freedom” can mean different things to different people and ultimately leads to opposite conclusions regarding how beneficial the emancipation of slaves was to freedom. This ambiguity and flexibility of the natural language allow us to compose sonnet and plays but also impose challenges in many cases to make communications transparent and clear.

Mathematics, similarly, also is a way to describe the relationship between entities. 2 + 3 = 5 is the equal relationship between putting 2 and 3 together, and a 5. It is somewhat similar to say “if you put 2 apples in a basket, then another 3 in the same basket, there will be 5 apples in this basket.” What is different between the language of mathematics and the language of English, is the grammar of math is far more restrictive, and the language of mathematics is designed to be free of contradictions. Any two statements made in the same mathematical language, can not contradict each other. It is not possible to get 2 + 3 = 5 and 2 + 3 = 4 at the same time if we follow the mathematical language correctly. But this is not the case for natural language, as the example of “freedom” has demonstrated, it is possible to draw conclusions such as “the emancipation of slavery is good for freedom, but is also bad for freedom.”

A linguistic system that is designed to be free of logical contradictions is called a formal system, or a formal language. In contrast, a linguistic system that contains logical contradictions is called an informal system, or a natural language. An obvious example of natural language would be English. Mathematics is the study of all formal languages or systems. But formal systems are not just numbers or equations. For example, video games are formal systems, at least those without many bugs. They play by a set of very specific rules, all players are forced by the program to behave according to those rules. When those rules are enforced, the game will not get into a logically inconsistent state, for instance, two players would not occupy the same virtual space, and the game would usually end up with only one winner. But video games are not the only formal systems we commonly interact with. Humans have been designing formal systems for millennia. Board games such as chess and go are formal systems, so are card games such as poker and blackjack. They all have a fixed set of rules, that if all players choose to follow, the state of the game will be consistent. In a chess game, at the end of each turn, no two live pieces can occupy the same space. In poker, if someone ended up with more cards than others or 5 Kings has shown up, we know something fishy is going on. By agreeing on the rules of the play, something very remarkable has happened: unlike our daily arguments in which people often seem to have a hard time to agree on the conclusion even when presented the same evidence, when shown a chessboard or two poker hands, people agree on who is the winner and who is the loser. This power of agreement enabled by formal systems, let more and more scientists favor mathematics as the language for scientific theories over natural languages. Because an objective standard should one that everyone agrees on, and correct usage of mathematics can make that happen.

But unfortunately, simply because we agree on the rules, does not always mean the rules constitute a contradiction free formal system. We can make up a simple game, two players would draw a number between 1 to 10, if the player draws a 2 or a 4, she would win the game. You might immediately realize two problems. First, what happens if one player draws a 2 and the other draws a 4, who wins the game? Second, what if neither of them draws those two numbers? You see, not all sets of rules using numbers lead to unambiguous answers. A lot of scientists have a set of rules they play by to draw conclusions, such as the statistical significance tests, but they don’t always lead to unambiguous answer about the validity of their theory. That is, sometimes the rules may say the theory is right, and other times, the rules would say that it is wrong. But I will get to statistics later.

I will start with the set of rules mathematicians play by, logic. Before I define logic, I want to make two concepts clear, premises, and conclusions. In an argument, premises are statements that we hold true without providing any justification. We take the truthhood of the premises as granted, and they are the entry point of an argument. If every statement requires justification from other statements, we unavoidably would end up with either an infinite regress or cyclic reasoning. What’s wrong with infinite regress? Consider the “why” game kids often like to play. “Why does the Sun rises every morning?”, “Well, it is because the Earth rotates on its axis.”, “Why does the Earth rotates on its axis?”, “It is probably because when Earth was formed, the dust, rocks, and gases orbiting around the cores transform the angular moments to Earth.”, “Why does the angular momentum can be transferred from one object to another?” “…” You see, we can keep asking the question why, but at some point, we will find ourselves stuck at the end of the chain that we can not answer, even God can not save us. We can say “That’s how God has created the world.” But then how do we answer the question “Why did God create the world the way it is?”. The problem with infinite regress is that if we keep asking why, for all questions, we will be stuck at some point, therefore no statement can be sufficiently justified without assuming something to be true in the first place.

What about cyclic reasoning? Take the following example: “You must obey the law because it’s illegal to break the law.” The premise is it is illegal or wrong to break the law, the conclusion is therefore it is right to obey the law. Saying it is wrong to break the law is equivalent to say it is right to obey the law. That is the premise and the conclusion is essentially the same in this argument. The type of arguments that conclusions are included in the premises is called “petitio principii” or it’s unfortunate English translation: “begging the question”. It is probably better translated to “assuming the question”, but the translation has already stuck. “begging the question” is the simplest but most common form of cyclic reasoning. You may have noticed that in the argument that “You must obey the law because it’s illegal to break the law.” The argument says nothing about what the law is, or should be. The law may say not to steal or it might be forcing people into a cage fighting to the death in a battle royale as in the Hunger Games. Some may say that the law only enforces good things, therefore we should obey it, which is again also begging the question, by assuming the law is good then conclude obeying it is good. The issue with cyclic reasoning is that we can basically use it to prove anything, all we need to do is just to assume what we want to prove to be right in the first place.

Logic is one type of reasoning processes that also carries an argument from premises to conclusions. I want to make sure I am clear here. I am using the word “logic” very narrowly. It means exclusively mathematical logic or deductive reasoning, by which true premises always lead to true conclusions. There are other types of reasoning as well, using which true premises sometimes can lead to false conclusions. Those are fallacies. Cyclic reasoning and infinite regress are two of them.

Consider the following argument: “All humans are mortal, I am a human, therefore I am mortal.” The two premises here are “All humans are mortal” and “I am a human”. The conclusion is “I am mortal”. It is called deductive reasoning is because it uses assumptions about the general cases: the mortality of all human beings to deduce a specific case: “am I mortal?” Deductive reasoning is the foundation of mathematics. In mathematics, we start with definitions that are statements about the general we hold true, from there we draw conclusions for special cases of the general. For instance, we can start with the sum of inner angles of all triangles is 180 degrees, then for right triangles with one 90 degrees angle, the sum of the other 2 must be 90 degrees. We have just deduced the property of a set of special triangles, right triangles, based on our assumptions of all triangles.

Instead of going from the general to the specific, we can go from the specific to the general in mathematics, using what is called “reductio ad absurdum” or, proof by contradiction. It goes as follows, I started by saying “all mammals have tails”, you want to show me that I was wrong, so you pointed out that I do not have a tail, therefore “all mammals have tails” can not be true. When we try to make broad statements about the general, we can show that the statement to be false if we can provide a counter-example, or show a contradiction. All mathematical proofs are based either on deductive reasoning or proof by contradiction without exception. If you pay attention, you will realize that all of our daily sentences are either about a broad statement of a group or some specific cases. In mathematics, we call the statements about all members of a group universal quantification, the statement can be “all humans are mortal”, “all mammals have tails”, or in the real world, “all politicians are corrupt”, or “all college graduates are smart”. You may immediately realized that making broad generalizations about human groups simply doesn’t work in most cases. It is because of how those groups are formed. In Mathematics, all triangle as three edges is because we call shapes triangles if they have three straight edges that connect to each other. But there isn’t a social club that is exclusive to corrupted people and calls themselves “the politicians”. The group of a triangle is created based on a uniquely shared set of properties: the number of edges, how the edges connect, etc. The groups in mathematics are defined by definitions. But human groups are formed more organically, there is usually a shared trait that most of the people in the group have, for example, most politicians are involved with the government in some capacity, but there might be some politicians who just call themselves politicians because they want to break into that inner circle. The groups in our daily usages are generally defined by examples. This blurriness of the human groups has inspired many sciences that study human behaviors such as sociology, phycology, economy, biology to favor statistics for analyzing humans.

Statistics is a can of worms that deserves its own episode later, but I will say this here. When using it correctly, statistics can provide incredible insights into human nature. But when used wrongly, it can be just as problematic as all the other fallacies. We often think mathematics and numbers are objective, therefore if we use them and do the calculation correctly, our conclusion must be true. That is, to put it bluntly, dead wrong. Here is an example, I claim that if we put 2 ducks into 2 liters of water, we will get 4 duck water because 2 plus 2 equals 4. It is obviously absurd because we can’t add 2 ducks to 2 liters of waters. But if we add 2 liters of water to 2 liters of propanol, due to the space between molecules, you will get 3.94 liters of the mixture, not 4. That is, algebraic addition is not a good model for this phenomenon. If we have a bad model, we can do all the math right, our conclusion is still wrong. Btw, this is a “reductio ad absurdum” or, proof by contradiction argument for why not all arguments use numbers are correct.

In contrast to the universal qualifier, which qualifies properties shared by a group, the existence qualifier conveys properties of specific cases. For instance, the claim that all swans are white is a universal qualifier about every swan. But the statements: “there exist swans that are not white” and “some swans are not white” are existence qualifiers that make claims about some specific cases that at least one swan is not white in this universe.

English speakers know that “Not all swans are white” is not the same as “All swans are not white”. The first statement is an existence qualifier, that claims some swans are not white, but the latter statement is a universal qualifier, that claims all swans share the same property of not been of the color white.

The negation of a universal qualifier is an existence qualifier and vice versa. For example, what is the opposite of “All swans are white”? It is not “All swans are not white”, but “Not all swans are white”. I should define what “opposition” and “negation” in logic are first. Two statements A and B are opposite to each other if they can not be both true at the same time, but also, one of them must be true, that is, they can not be both false at the same time either. If you care about the mathematical jargon, two statements that can not be true at the same time are called mutually exclusive while two statements that can not be false at the same time are called complementary. So two statements are opposite if they are both mutually exclusive and complementary. “All swans are white” and “All swans are not white” are mutually exclusive, because they can not both be true, but they are not complementary, because if some swans are white and some swans are not, both statements are false. The opposite or negation of “All swans are white” is “Not all swans are white” or “Some swans are not white”. The negation goes both ways, so The opposite or negation of “Some swans are not white” or “Not all swans are white” is “All swans are white”. You may have noticed that the negation of a universal qualifier is an existence qualifier and the negation of an existence qualifier is a universal qualifier.

The exclusivity and complementary of the universal qualifier and existence qualifier lie in the heart of the falsifiability of the scientific method. Consider this theory mentioned above: “All swans are white”. How can we know if it false? Well, if we found a swan that is not white, that is, there exists a swan that is not white, we know the theory must be false. But how can we know if it is true? This is tricky as we will have to examine all possible swans, from the past, present, and future to make sure every single one of them is white. It is impossible. That is, we can’t prove the theory with a universal qualifier to be true, only to be false by finding a counter-example.

Now, how about this theory: “Some swans are white”. Note this theory states an existence qualifier instead of a universal qualifier. Well, if we found a swan that is white, we have shown the theory to be correct. But can we show that it is wrong? Same as proving that all swans are white requires the examine all swans in the universe of space-time, to prove that no swans are white, which is the negation of some swans are white, would also require us to examine all of the swans to make sure that every one of them is not white. So to prove the falsity of an existence qualifier, just as proving the truthhood of a universal qualifier, is an impossible feat.

So what statements or theories can be considered scientific? In the first episode, I said that what was revolutionary about Newton’s theory of motion, is that it is a unified framework that had been demonstrated at the time all things in the universe must follow. From the beginning of history, humans like to see patterns between events. We read stars to predict future events, cast bones for our fortune, but most of those predictions are more often misses than hits. Newton’s theory of motion shows a comforting certainty that we desperately want, using his equations, we can calculate the future returns of Halley’s comet just as certain as we know the Sun will rise tomorrow. This kind of certainty allows us not only to understand the events of the past but also to produce accurate predictions of the future, which is the key to the reproducibility principle laid out in the last episode. The only way for a scientific theory to provide accurate, reproducible results in the future, is that it is stated using a universal qualifier.

If I theorized that “all swans are white”, if you found a swan, according to my theory, and if the theory is true, I can tell you that it must be white without seeing it myself. This swan theory tells a definitive relationship between the specie of an animal and the color of the animal. Similarly, in Newtonian mechanics, if we know the velocity of a cannonball and how far it is from the ground, I can predict to astonishing accuracy, all of its future positions before it hits the ground, given that nothing else has disturbed it.

What if I theorized that “some swans are white”? If you found a swan, according to my theory, and if the theory is true, what would be the color of the swan? Well, we don’t know. It can be white, black, or even purple for all we know, that is this theory is almost useless in predicting the future. Freudian analysis or other types of data interpretation may give us some insights into what has happened in a specific event, but if they can not be used for future predictions, they are not scientific.

Science tries to find generalized relationships between observations, whether it is the current position and velocity of a given rock to its future position and velocity, or the patients’ symptoms and the drugs they should take. A single event tells us little about what the relationship is. During the black death, it was rumored that Queen Elizabeth has received a ring that has purportedly protected her from the black death. Even though it may seem absurd to us a magical ring can protect against bacterial infections, but Queen Elizabeth has never got the plague, as absurd as the argument may sound to us, it is simply impossible to show that a connection between two an events to be false, based on a single occurrence. But it is not just the argument about the magic ring and its healing ability that are problematic, we actually engage this type of reasoning quite often. Say that I have found a new diet online, I have tried it for two weeks, and afterward, I was two pounds lighter. Note that I did not say that the diet has made me two pounds lighter. Based on the observability principle, I have not observed how the diet has affected my weight, only that I have taken a new diet and two weeks later, I weight two pounds less. You may have heard of the phrase correlation is not causation before. Here the new diet and weight loss were only two correlated observations, we have not yet concluded that one has caused the others. But can we conclude that the diet has caused my weight loss? Well, if we do, using the same logic we will have to also accept that the ring has protected Queen Elizabeth from the plague because we have exactly the same amount of information in the two cases.

So, OK, one occurrence is not enough to prove causation, how about 100 cases? If I post on a forum and got 100 people who have also used the diet and observed an associated weight loss, can we conclude that the diet caused weight loss? The evidence certainly feels more convincing. To some people, it may even enough to conclude that the diet is working, and they are willing to try it out themselves. As mentioned previously, the logical process from general to specific is called deductive reasoning. If all humans are mortal, then this human is also mortal. In contrast, the process from the specific to the general is called inductive reasoning, inference, or proof by experience, if we find one human to be a mortal, then another human to be mortal, then another one to be mortal, with enough examples, we then conclude that all humans are mortal. Inductive reasoning may sound appealing to many people or even feels intuitive because it is how we learn new concepts for most of the time. When we were toddlers, we don’t learn what the color red was by being told that red is the light with a wavelength of 620 to 750 nm. No, we were shown pictures of red apples, red car, red cubes, we learn by finding the commonality between all of them, that specific color shared by all of the pictures. Even mathematics started this way, most of us would never learn the formal definition of addition or multiplication, but whoever has gotten through elementary school learns how to do addition and multiplication just by given many many examples and doing many many exercises.

Inductive reasoning is intuitive, but there is just one problem with it, it doesn’t always lead to true conclusions. In the last episode, I have discussed the constant measure of the speed of light and the discovery of general relativity. I will use this example again. Based on our experience, if I am running on a moving train, for someone off the train to measure my speed, she would get the arithmetic sum of the speed of the moving train and my running speed. If we run the experiments with different train speeds and different running speeds, we can see that the measurement is always the arithmetic sum of the speed of the moving train and the running speed of the person. And that was what people have observed for centuries. By inductive reasoning, we can conclude that the accumulations of the speeds of all frames of reference are arithmetic sums. But there is just one problem, the measurement of the speed of light is always constant. You can’t shoot a light off a moving train and get a faster light than if you shoot it from the ground. Inductive reasoning has lead us to a wrong conclusion. But it is not just the measurement of the speed of light induction can get wrong.

Machine learning and AI have become hot topics in the recent public consciousness. Most of the machine learning algorithms are designed to use inductive reasoning to find connections between variables. For instance, you can provide a collection of pictures of scientists and train the machine to learn what kind of people looks like scientists. The computer will try to find common features among those pictures and consider those as the defining features for scientists. But if we give it examples of scientists that are mostly male, the algorithm would recognize a strong correlation between the male gender and being a scientist and therefore conclude that males are more likely to be scientists. It is not very much different from human reasoning, in the turn of the 20th century, when female doctors were rare, most people have only seen male doctors would easily conclude by induction that only males are capable to be doctors and doctors is not a profession for females. In one research conducted by a team at the University of Washinton, they seek to find a machine-learning algorithm that can tell apart huskies from wolves. To their surprise, the algorithm was able to tell apart husky and wolf about 90% of the time. But later the researchers have realized that instead of learning the subtle feature differences between husky and wolf, what the algorithm has learned was that the images that were labeled as wolves were mostly with snowy backgrounds while the images that were labeled as huskies were mostly with grassy backgrounds. So by induction, the algorithm reasoned that if the background has snow, the animal must be a wolf, and if the background has grass, the animal must be a husky. When the researchers digitally altered the images and put a husky onto a snowy background, the algorithm identified the husky to be a wolf.

It is not just wolves and huskies that an inductively-reasoned machine can get wrong. In the United States, some states, such as New York, Wisconsin, California, and Florida use a risk assessment software called COMPAS in court to access the risk of a convicted criminal to determine if the criminal would like to commit another offense in the next two years. Based on the risk assessment provided by COMPAS, the judge would recommend more severe sentences to higher risk convicts. In 2016, ProPublica has launched a series investigation of the algorithm and has found that blacks are almost twice as likely as whites to be labeled as higher risks by COMPAS. Many people based on their person inductive reasoning may agree with the conclusion drawn by the software. But it is not the end of the story, ProPublica has also followed up with people been accessed by COMPAS in the following years, they have found that the whites who were labeled as lower-risk are much more likely than blacks who were labeled lower-risk to go on to commit other crimes, furthermore, only 20 percent of people predicted to commit violent crimes actually went on to do so.

In the age of big data, it is certain spooky that how much the large internet companies know about us. They can recommend advertisements, news, or shows that are tailored specifically towards our engagement. But I think it is even scarier when we start to blindly trust the prediction of black-box AIs without understanding exactly the interworkings of how those predictions were made. When we start to use AI for medical diagnostics or criminal trials without understanding why those recommendations were made or even if they were correct, those mistakes can do the most harm.

Inductive reasoning makes a lot of intuitive sense, and it can be a useful shortcut for our everyday decisions. If I like most of the Disney movies in the past, I can inductively reason that I am gonna like the new one that is coming out, and make plans to catch it in the theatre, but, of course, it in no ways guarantees that I would like the new movies. From a pragmatic point of view, we have only limited information in life, and it is practical to use our past experience and inductive reasoning to make decisions. But in no way inductive reasoning can be used as proof for the validity of our beliefs. As we have seen above, inductive reasoning can lead to prejudiced conclusions that can be extremely harmful to those who have been judged wrongly or lead to misguided policies from racial segregations to eugenics that can cause large scale sufferings.

Science concerns ultimately with what is true, not what is practical. Therefore we can not rely on inductive reasoning for scientific inquiry, only deductive reasoning. In addition, we want to use the scientific method to find universal laws that we can use to predict the future. But this left us with a dilemma because we can never prove a statement in the real world with a universal qualifier to be true, only show that it is false by contradiction. Indeed, modern scientific discoveries are called theories instead of laws, such as the theory of evolution, or the theory of relativity, because the scientists in the early 20th century have realized that what we have considered as truth or universal law, from Newtonian laws of motion to the law of supply and demand proposed by Adam Smith turned out to be not so universal after all. We now use the word theory for scientific discoveries not because they are merely a hunch we had, but that we understand even though those theories can pass the most diligent tests we can perform the moment, we can not be sure that in the far future, with the aid of more accurate instrument and experiment setup, we would not be able to find counterexamples that demonstrate that our theories may turn out to be false after all.

This is at the core of the Falsifiability principle. We can never be certain that our theory is right. The scientific method requires systematic doubting as proposed by Rene Descartes. Our best chance to show that a belief to be true is to turn over every rock do everything in our power to try to find evidence and examples to show that our belief is false. Only when we have tried our hardest to question and doubt what we believe and have failed, we may have faith, that our belief might be true.


In recent years, humans have started to accept the uncertainty of the universe, and instead of seeking scientific theories of certainty, we have shown a preference to use statistical analysis to form theories of uncertainty about the working world. There is a certain appeal to it. There are so many unknowns in the world and so far mostly only in a very ideal laboratory setting, it is possible for us to gather enough information to make accurate predictions. With statistics and probability, we can still make educated and informed decisions with limited information. But statistics can be misguiding as well, I will discuss the fallacy of p-value later, but here is one important note. For a universal qualifier, when an impossible thing happens, we know our premises is wrong. When I claim all swans are white, and you swing a black swan in my face, it is sufficient evidence to show that I was wrong. But if I claimed that most of the swans are white, you can’t prove that I was wrong by showing me a black swan or using it as a piece of evidence to say that my claim is unlikely to be true, even though it may feel attempting to do so. Because unlikely events happen all the time. People win the lottery or be stroked by lightning, but they are not evidence to conclude that winning the lottery or being struck by lightning are common events. The falsifiability principle of the statistic tests are different from theories of certainty, I will pose it as a challenge for the listeners to think about how to use the consistent measurement of the observability principle, to falsify a statistical hypothesis.

Scientific Method: 2. Reproducibility

In this episode, I will talk about the reproducibility principle of the scientific method. I want to start the discussion by defining the word “Truth”. In the last episode, I stated that the scientific method is the only way known to humans that leads to the objective truth. But if I were to convince you that is the case, I probably should first define what truth is. When we use the word “truth” in our daily conversation, we usually mean an event that is from our direct observation or a conclusion drawn from those observations. Most of us believe that seeing is believing, we trust our memory and senses and often consider them as the most reliable evidence. We also trust our thought processes, after all, that is why we use them to draw conclusions that guide our daily decisions and actions in the first place, even though it is also quite often that we can not describe our exact reasoning, or provide proof for the validity of our rationale.

When we tell our friends about our favorite shows or describe to them in detail how our day went, we are, in a way, translating our sensory experiences into words. This is, in fact, quite similar to use a ruler to tell the length of a table, which is also converting experiences in the physical world into symbols. When we converse with our friends about our day, we are making measurements about the world and conveying those measurements to others, so that they may in part, experience what we have experienced. It may sound radical to consider humans as measuring devices of the physical world, but many philosophers agreed. Immanuel Kant even went so far to claim that human minds contain categories of understanding and are made for examining and understanding the universe. If we were to define truth as directly observed events, or conclusions drawn from direct observations, what we meant is that truth is just our measurement of the world based on our sensory experiences. But do people experience the same events and yet make different factual claims and or come to different conclusions? Well, quite often, we do.

In the last episode I have discussed that not all measurements are made equal, without a clear and agreed definition or standard, humans can, and often do, looking at the same events and information but comes to vastly different or sometimes even completely opposite conclusions. Consider the following question: “Is the emancipation of the slaves good for freedom?” From the western contemporary perspective, the answer to this question may seem to be bluntly clear, that it should be an uncontested yes. But it is more complicated than that. In the dialogue of Gorgias, Socrates asserted: “no one does evil willingly.” Failing to understand this, has been a great empathy barrier for humans. We often see people who do what we consider as stupid or evil as stupid and evil. But people most of the time were simply doing what they believe to be good by their own definition of goodness. I think language was designed for communication, to make one’s thoughts and reasoning transparent to others. To use language effectively, it can help us understand different perspectives and resolve differences without the need for violence. When we decide to not to use language to communicate and understand but to provoke and ridicule, or simply not to communicate at all, it often leads to great divisions and violent outcomes at the end. During the American civil war, it is easy to view those who fought on behalf of the south as racist or against freedom. But if we read the narratives and the correspondence of confederate soldiers, they too were fighting for freedom, a free market without government intervention, the freedom of sovereign, their economical and political independence, and protection from government confiscation of properties. Freedom means different things to different people and sometimes can even take contradictory forms. Before we pass judgment on others, it is important to first try to understand their definition and perspectives, because they may be different from ours. On a side note, The primary crops produced on the plantations were sugar, tobacco, and coffee. Atlantic slavery was the by-product of a free market that is unbridled by regulations to satisfy the insatiable consumer desire that brings human pleasure, but not nourishment.

What freedom means is a complicated question. Most of the concepts we deal with every day often mean different things to different people. One of the common failures in human communication is that we have a hard time recognizing that others may have different definitions from ours when we are using the exact same words. When we examine other’s arguments using our own definition, their arguments may seem absurd and making absolutely no sense. But if we put those arguments back into their original context, using the speaker’s definition, those arguments do often have some legitimacy. Failing to understand this principle of communication can make differences in opinions seemingly unresolvable, in ways that they are not. Unfortunately, throughout history, violence often ensues when humans fail to see the legitimacy of the others’ perspectives.

I want to share here a quote from What We Owe to Each Other by T.M. Scanlon (you may also know this quote the show the Good Place):

The reasons we have to treat others only in ways that could be justified to them underlie the central core of morality and are presupposed by all the most important forms of human relationship. These reasons require us to strive to find terms of justification that others could not reasonably reject. But we are not in a position to say, once and for all, what these terms should be. Working out the terms of moral justification is an unending task.

In a way, human history is the struggle of working out the terms of moral justification with different beliefs and ideologies. What does mean to be just, what does it mean to be free? Those terms have been questioned and debated throughout history, and ultimately settled on the battlefield.

When we disagree on what is true, sometimes, it is due to the difference in definitions, while other times, it is just our notoriously unreliable perceptions and memories. I want to share two studies that I think have demonstrated this quite well. In the car crash study conducted by Psychologists Loftus and Palmer, participants were shown 7 different film recordings of car accidents. After viewing the film, they were asked to estimate the speed of the impact. But there was a catch, all participants viewed the same films and were asked the almost same question besides a single word. Some were asked the speed at which the car was “smashed”, others were asked the speed at which the car was “hit”. The group that was asked with the word “smashed” estimated the speed of impact as around 42 miles/hour while the group that was asked with the word “hit” estimated the speed of impact as around 31 miles/hour. Remember, those were identical footages both groups saw. Afterward, the researchers asked the participants to recall if they have seen any broken glasses from the films a week later. About 30% of participants who were asked with the word “smashed” said that they remember seeing broken glasses while only 15% of those who were asked with the word “hit” did. There were no broken glasses in any of the films.

Are we better at remembering details of our own lives than the experiments we participate in? After the tragedy of 9/11 in 2001, scientists have conducted a national survey of over 3000 people from New York City, Washington, D.C., Boston, and four other cities in Connecticut, Missouri, and California. The participants were asked how they felt, what they were doing, and who they were with when they heard about the news. Many have given very detailed descriptions and said that it was a day and time they would never forget. The researchers followed up with a couple of other surveys throughout the years. They have found that after a year what people remembered was only about 63 percent consistent comparing to their original testimony after 9/11. After three years the consistency was 57 percent. What’s even more perplexing, is that the participants were firmly convinced that their testimony was absolutely accurate, they couldn’t believe that their memory was wrong, even when they were given their original statement and shown the contradiction. They even start to question the accuracy of their original statements, because they were so convinced that what they remembered at the moment must be true. Our memory is not so much as an infallible recording of what has happened, but a reflection that been constantly shaped and changed by our current experience and thoughts. Can we solely rely on our own memory and experience to decide what the truth is? I am not sure we can.

It is not only the memories and judgment of individuals that can be flawed but so the memories and judgment of human enterprise as a collective. Throughout most of human history, humans believed that the Sun revolved around the Earth, and would consider those who think otherwise as heretical and insane. In pre-enlightenment Europe, the divine right of kings is a commonly accepted belief, that our place in the society is ordained by God. This universe is among the best of all possible worlds. Any attempt to change the existing social order would be defiance to God’s will. Of course, from our modern perspective, it is a blunt attempt for the powerful to use divinity to justify their own socio-economical position and maintaining the status quo. But it is wildly held belief for over a millennium nonetheless, by not only the kings and lords but also by those whom they have ruled over. Many Shakespearean plays are themed around the divined right of kings from Hamlet to Macbeth. Good thing that contemporary humans would never believe anything false with the help of the internet to settle all arguments, right?

I stated before that the scientific method was the best way known to humanity to find the objective truth. What I meant is that when our future observations always match the prediction made based on our theories, then we may call the theory true. One of the key factors that differentiate scientific theories from pseudoscientific theories is if the theory can be used for prediction or interpretation. Science predicts the future, pseudoscience interprets the past. Sigmund Freud was arguably one of the most famous people of the 20th century. His psychoanalysis theory has shaped many popular ideas in the past century and it is still shaping not just what we think but how we reason as a society. He would listen to his patient’s experience and history and come up with an analysis that explains his patients feeling and actions. If an aggressive patient hasn’t been hugged much when he was a child, Freud would explain his aggression as an expression of lack of intimacy. But if the patient had a good childhood and with much attention and love, Freud would explain his aggression as an expression of seeking attention. As long as the Freudian explanations remain some linguistic connections to his patient’s experience, no one can show those after effect analysis to be true or false logically, even though some may found his analyses convincing, others may not. Then the validity of the theory is completely up to the subjective perspective of the beholder. Similar to Freudian psychoanalysis, racial theory and social Darwinism, were also developed during this time period in aim to explain the existing economical and social inequalities in Western societies. Even after a hundred years, those pseudoscientific theories still deeply shape how we view others in our society and we are still living their shadows. It is not to say that it is always wrong to interpret and read the meanings of the events of the past. Sometimes, those interpretations can provide important perspectives, but we can not know if our interpretations are correct. We often failed to see that with limited information, there can be many different legitimate sensible interpretations of the same events. Only predictions of the future based on our observation can be validated. This is how Albert Einstein introduced general relativity to the world.

The discovery of general relativity was a fascinating story. I want to tell you from the beginning. Most of us have learned in school that light is extremely fast, but still travels at a finite speed. But it is not a simple conclusion to come to, because for almost all our daily interactions we can see events about at the same time as they unfold at the distance. When we see something happens with our own eyes, is the moment we consider that event has just transpired. We don’t look at the Sun and think, “Oh, that was the Sun 8 minutes and 19 seconds ago.” How can we know what is in the sky was the Sun 8 minutes ago, 8 seconds ago, or right now? From 1668 to 1678, Danish astronomer Ole Rømer, who was incidentally the inventor of the modern thermometer, made about sixty observations of the eclipses of Io, the first moon of Jupiter. Io orbits around Jupiter about every 42.5 hours. Galileo was the first human discovered Io in the early 1600s. Due to the enormous size of the gas giant, for every cycle Io orbits Jupiter, the first moon disappears into then emerges from the shadow of the planet. Using this disappearance and emergence, astronomers can calculate the orbital cycle of Io. What Rømer has found during his 60 observations was that the period of the orbital cycle seems to be changing throughout the years. Specifically, when the earth was moving closer to Jupiter, the time between eclipses of Io seemed to be shortened, but when the earth was moving away from Jupiter, the period seemed to be longer. What Rømer speculated was that the period at which Io orbited Jupiter had not changed, what has changed is the amount of time between when the eclipse happened, and when he saw it. That is, it takes time for him to see events that happened far away. Based on the changes in the observed Io orbital periods, he estimated the speed of light to be about 220,000 km/sec only 26% lower than our modern measurement. By the way, the telescope Rømer used for his observation was extremely low in clarity and magnification and was far worse than most of the hobbyist telescopes we can get in a supermarket. Anyone with 50 bucks in their pocket, in theory, can verify and reproduce his result. Later, Newton accepted Rømer’s idea and did his own calculation of the speed of light and estimated it takes about 7 to 8 minutes for the light to travel from the Sun to the Earth, in comparison, the modern-day measurement is 8 minutes 19 seconds.

Since its publication, Newtonian mechanics has been the accepted theory of everything for over 200 years. Based on Newton’s theory, if I am running on a moving train when someone else off the train measures how fast I am moving, she should get the sum of the speed of the train, and the speed of me running on the train as the measurement. Similarly, if a light is emitted on a moving train, the speed of light measured by someone on the ground should also get a similar accumulation effect of the moving train and moving light. In the mid 19th century, many scientists tried to measure the change of the speed of light on a moving frame of reference, but all have failed due to high measurement errors. The speed of light is so fast, there was simply no accurate enough clock or fast enough train that allowed scientists at that time to measure the difference of speed of light in different frames of reference within the measurement error margin. In 1887 Albert Michelson and Edward Morley used a device called Michelson interferometer and hoped to show that the speed of light changes based on the angle relative to the movement of the Earth around the Sun. But they have failed to confirm their hypothesis, that is all measurement of the speed of light was the same regardless of the frames of reference. At first, they thought it was because their device was not accurate enough, but they verified their device and repeated their experiments and got the same results. They concluded that their result was not due to an experimental error. When Michelson and Morley first published their results, it attracted much criticism as their experimental result directly contradicted the wildly accepted truth. Newtonian mechanics to many people was not just a theory that aims to model the behavior of the universe, but the unbreakable law that all things in the universe must obey. Even though many people didn’t accept Michelson and Morley’s result, Albert Einstein did. The constant measurement of the speed of light in all frames of reference became the cornerstone of the theory of relativity.

The Michelson and Morley experiment was arguably the most important failed experiment in history. The experiment was wildly considered as a failure because it did not confirm the well established Newtonian theory. But simply because a theory is well established or accepted, does not necessarily mean it is true. Negative results often get a bad reputation in research, as they are difficult to publish. But they can have tremendous epistemological values. If we carefully scrutinized our experiment, and we repeatably get the same negative results, we should not select data to fit existing theories, but publish all the results and question the hypothesis, and document the experiment as detailed as possible to allow other researchers to be able to repeat our experiment and also get the same result therefore to verify the falsity of the theory themselves.

Based on the finding of Michelson and Morley, Einstein postulated that time must dilate when the frame of reference is moving to compensate for the constant speed of light, he developed the theory of relativity. After the publication of the general theory of relativity in 1915, many people were skeptical regarding Einstein’s theory. After all, the theory of relativity does not seem to reflect our everyday experience, we don’t feel time slowing down when we run really fast, or see the space bending when we are around heavy objects. To demonstrated the validity of his theory, Einstein waited patiently for a solar Eclipse. In 1919, British astronomers Frank Watson Dyson and Arthur Stanley Eddington carried out two expeditions, one to the West African island of Príncipe, and the other to the Brazilian town of Sobral to make observations the total eclipse. As the theory of general relativity postulates that mass distorts space while Newtonian mechanics does not, general relativity predicts the deflections or bending of lights near the large mass bodies which will displace the position of stars in the sky next to the Sun during the eclipse, that Newtonian mechanics can not account for. During the Eclipse, the two expedition teams measured the positions of the stars near the Sun. Afterward, Dyson and Eddington compared the measurements to their own calculation and confirmed that the measurements agree with the prediction made by general relativity, not Newtonian mechanics. This experiment has made international news, and Einstein became a superstar overnight.

The scientific method does not give us a way to convince others about how right we are. But it allows others to verify our claims. With the help of a concise language, the scientific method also enables us to make our own knowledge and understanding transferable to others. It wasn’t Einstein who demonstrated the correctness of the general relativity, but, Dyson and Eddington, followed the theory Einstein proposed, did their own calculation, conducted their own experiments, and confirmed that their observation matched with that prediction. If the observation had contradicted the prediction made by general relativity during the solar eclipse of 1919, that would have been the end of the theory. In comparison, pseudoscientific theories such as Freudian analyses can always reinterpret the result to remain some connection to the original theory as long as the connection sounds convincing. The motto of the scientific method is “Nullius in Verba”, meaning “on no one’s word”. It truly captures the heart of the scientific inquiry, that truth is not declared by anyone but should be able to be verified by everyone. A theory can not be called scientific until someone else has reproduced the result without knowing in the first place what the originally published data was, so there is no confirmation bias. To allow anyone to verify our theory, the scientific method demands reproducibility. The measurements we make need to be consistent. The logic and reasoning we adopt should be standardized. Anyone who knows algebra can use general relativity to make their own predictions, and everyone should calculate the same results. Ultimately truth should not be settled based on which argument we found convincing, but based on an objective standard: If an objective agreed prediction matches an objectively agreed observation. In the last episode, I have discussed consistent measurement, which is the first key to reproducible experiments. In the next episode, I will dive deeper into the 2nd key: standardized logic, reasoning, and mathematics.

In recent years, there is a growing concern with a reproducibility crisis in the scientific field, as very few research works can be or have been reproduced and verified by other scientists. Some are caused by the lack of standardized and consistent measurements, but many are due to that different scientists doing the same analyses may use different reasonings. In 2018, Doctor Brian Nosek and his team published the paper: “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results“. During this research, 29 teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. All of the teams were given the same data, that is they all have to agree with the measurement. But 20 teams have found that there is a statistically significant connection between red flags and skin color while 9 teams didn’t. How can math disagree with itself? During the analysis, because there is no mathematical definition of the question “whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players”, 29 teams used 21 different metrics to quantify this given question, and therefore, ended up with different conclusions. Which analysis is correct? Well, it depends on which definition we choose. We can’t agree on the conclusions unless we first agree on the definition.

In the next episode, I will talk about logic, the standardized way of reasoning, its counterpart, fallacy, and Karl Popper’s final insight into the Scientific Method, Falsifiability.

On Racial Disparity and Nature vs Nurture

In the past couple of weeks, racial tension and unrest has raged through United States. As a outsider, I hope to understand better of the problem. So here is me, sharing about some of the things that I have learned in the past couple of week.

To be be able to solve a problem, first step is to understand the problem. So what is racism? According to Wikipedia: Racism is the belief that groups of humans possess different behavioral traits corresponding to physical appearance and can be divided based on the superiority of one race over another. But linking physical appearance and behavioral traits are not always wrong. For example, penguins have small wings in comparison to their body, and therefore can not fly. In this case, to link penguins’ inability to fly with their apparent small wings is justified. I think when we are talking about Racism in the U.S. we are often talking about unjustified belief that links physical appearance and behavioral traits, such as skin color and aggression. But is this belief justified?

Unfortunately, some data, such as income and crime committed seem to suggest that is the case. African Americans constitute some of the lowest income groups in U.S. And even though African Americans forms 13% of the population, about 50% convicted murder are committed by African Americans. Readers who have read my previous blogs would notice that using those data to draw conclusions that skin color cause aggression or low economical position is pseudoscientific reasoning that mistakes correlation with causation. We can argue that African American’s sociological and economical is because of centuries oppression and disenfranchisement. But the income and crime data listed above can be explained both racial superiority theory and social injustice theory. So to show that it is wrong to link skin color to sociological and economical position, we simply need better data.

Before I continue, I recommend to read this post by Vox, as I will use the data presented in this post for the coming part. First, I will try to explain (based on my understanding) what racial superiority theory is. As much as we disagree with the theory, I think it is still crucial to understand this theory as charitable as possible. To show its falsity, we should show the inconsistency within this theory based on evidence. After all, if we do not base our beliefs on reasoning and evidence, then all beliefs are philosophically equal and every belief is right, and therefore no belief is.

So let us trying to understand racial superiority theory is. Racial superiority theory, at least the reincarnation after Darwin’s evolutionary theory are based on the following couple of hypothesis: First, there are two potential factors in deciding every person’s sociological, economical position and behavior trait: nature vs nurture. Nature is what we are born with, even though it is not restricted to, but we mostly associate nature with genetics. Nurture on the other hand is everything else, the family and the society we born into, our experience, etc. This lead to 2nd hypothesis: In United States (I say United States here, because we are discussing the situation in the states), a person’s sociological and economical position is decided mostly by their genetics or nature, while outside force plays very little effect.

The 2nd hypothesis is what most people found issue with. But it is the natural derivation to another belief: that U.S. is the land of equal opportunity. In a land of equal opportunity, success is only determined by one’s talent not any force that is external to that person. Of course, if one’s success is solely determined by their nature, we can certainly conclude that certain race is more successful in the society because they are the superior race. But is one’s success mostly determined by their genetics?

Now to the data presented in this post. This study measures social mobility across generations, that is the income level shifts across generations controlled by race and gender. It brilliantly separated genetics from societal factor, which I will attempt to demonstrate here.

In the study, the following two charts were presented. It compares the income of black and white males and females when controlled on parent household income. The chart top shows that when coming from the same household income rank, black males earns significantly lower than white males. While the bottom shows that such gap was not present between that black females and white females, even though both earns less than white males from the same parent household income rank.

If we think the success (or income) in society is majorly decided by genetics, then we can use the income as the measurement of (lack of better term) the genetic superiority (bear with me here). If it is true, regardless of physical appearance, if two person are about the same income level, they are going to have similar natural talent as well. Given that genetics pass down by generation, in a society with equal opportunity, we would expect to see that if the parents that are at similar income level (regardless of physical appearance), their children should also ends up in similar income level. But that is not the case shown by the data. For the parents at the same income level, the children who are Black males are about 10% lower in the income rank than the White males. But the gap between females are a lot smaller. It is most likely due to the fact that females as a group also suffered discrimination as their incomes are both lower than the income of white males, but I am making conjectures here.

What makes racism difficult to tackle is that race is not a cause of aggression but unfortunately has become a good predictor of crime in current society. Unlike a physical system where measurement and experimental setup can be separated. When come to human behaviors, due to self fulfilling prophecy, by believing and imagine correlations that was untrue, we can actually make it true over time, by treating people as if those correlation were true. Our subjective opinion can become objective facts when dealing with social phenomenons when enough people believe it for long enough a period of time. We should pay attention to how we imagine the world to be, because we do have the power to shape it through our imagination for better and for worse. We should be careful what what we imagine the world into, whether is is story we tell about race, or the story we tell about gender.

I think as a species, we have so much power and influence, but also we can easily fall into flawed reasoning. We would like to think that pseudoscience of Freudian psychoanalysis and social Darwinism are the ideas of the past. But pseudoscientific thinking is still insidiously influencing our values and choices. It is still common in our everyday reasoning to use correlation as evidence for causation, and interpret data to fit our hypotheses. If we believe that the world is profoundly unjust and want to make it for the better, I think a good start is our own beliefs. To think what ideals and values that we hold at the moment can be hurtful to others, and how can we use scientific method to test our values? If reader are interested to learn more about the difference between science and pseudoscience, logic and fallacy, I recommend Karl Popper’s Conjectures and Refutation as a start. Reader can also refer to my earlier posts for more information.


To end this post, I will leave the reader with the quote from Victor E. Frankl, Man’s Search for Meaning:

“Freedom, however, is not the last word. Freedom is only part of the story and half of the truth. Freedom is but the negative aspect of the whole phenomenon whose positive aspect is responsibleness. In fact, freedom is in danger of degenerating into mere arbitrariness unless it is lived in terms of responsibleness. That is why I recommend that the Statue of Liberty on the East Coast be supplemented by a Statue of Responsibility on the West Coast.” – Victor E. Frankl, Man’s Search for Meaning

The Attack on MSF in Afganistan

There was an attack on a maternity ward inside a hospital in Kabul, Afghanistan yesterday. The hospital is operated by Doctor Without Borders. From the information at hand, it would seems that the attackers were intentionally targeting the mothers while they were at the most vulnerable state.

I don’t know if I need to add more detail to it for the reader to see that it is certainly a atrocious act that would make most hearts jerk. It is certainly an evil act. But after reading the news, I feel stuck, don’t know what to do next. My intuition was to just move on to the next piece of news. After all, there is always more information online for me to consume. Why should I hung up with literally yesterday’s news, allowing it to impact my emotion negatively, while there are other news that are more relevant to my life, and far more entertaining things that can make me feel happy? I can’t do anything for them anyway.

But I am not so sure that I should follow my intuition. I don’t want to believe that there is nothing I can do to help, and I don’t want to believe that the goal of my life should be to find my own happiness.


In the Wealth of Nation, Adam Smith stated that “No society can surely be flourishing and happy, of which the far greater part of the members are poor and miserable.” I don’t know if as humans we own each other anything. But I want to believe we share the world together and therefore we are in this together. Given the contemporary news, it is difficult to imagine that the purpose of journalism is to provide context and information for the public to help them making informed decision. There are far too many pieces of news out there reporting numbers and statistics without providing any details regarding how the numbers are collected and how the statistical conclusions were drawn.

I often found myself in a very unhealthy relationship with the news. In a way, I am always desperately pursuing new information. But I rarely verify the news I read, nor do I take the time to process the information to see how it should guide my actions. I want it to change. During my research for the attack, I have also found out about another attack in 2015. On 3 October 2015, U.S. airforce struck a trauma unit in Afghanistan. At least 42 people were killed and over 30 were injured. The statement given was “there may have been collateral damage to a nearby medical facility”. Even though at the end the U.S. military as admitted the damage and attack, it was characterized as a accident, and none were criminal charged.


I want to take a pause here. Let us not just move on. It is wondrous that so many would risk their lives to help others, and it is monstrous that what is threatening them comes from other human beings. I know those news feels distance, and feels unrelated to our daily concern, especially with everything is going on in our own life. But I do think news are not for providing entertainments to us, at least not just. But also news informs us that there are people out there that needs our help, and that we can help. Couple of weeks ago, I have found a chrome add-on call Tab for a Cause. When opening a new tab, it will display couple of ads on the side, and the income from those ads will go to a charity that the user chooses. I think it is a wonderful example that even each of us can not make a big difference, but many of us together can. By choice to help, we are no longer bond by the tragedy of the commons. I also think that we should help. We would like to think that we are not responsible to others’ suffering, therefore we have no obligation to help. But if we can help yet we choose not to, doesn’t it make us responsible? We like to condemn those attacker for they are obviously responsible for the suffering. But the truth is, to blame one group is to exonerate all others, and by extension, ourselves. If we can help, yet choose not to, we are responsible for the suffering.

Let’s not to let yesterday’s news become just yesterday’s news. Take the time to think about what we can do to help. If reader has some spare change, MSF is always in need of donations to operate in the increasingly unstable middle east. Even couple of dollars can make a huge difference in someone’s life. If not, you can always installed the Tab for A Cause add on, it is a very simple way to help. I do believe that humans are at best when we work together.