Started from middle school, I was required to learn Physics, Chemistry, and Biology. In college, I have to study social science and political science to fulfill my general education credits. In retrospect, I have been studying science for almost all my life. But only until about two years ago, I started asking questions about the nature of science. I realized that I did not know what the formal definition of the scientific method was, and I had no idea why they would tell me the truth about the world around me. I know that physics and chemistry are science, while history and literature are art. But those are examples of science. What makes physics science, while history not? Is the difference in the usage of experiments? Physicists conduct experiments, but so do many modern historians. They use carbon dating to determines the date of historical documents. Furthermore, alchemists are also heavily dependent on experimentation. But we usually do not consider alchemy as science. Throughout the series, I will talk about different facets that define science and provided arguments regarding why we think that it works. Whether you are reading or listening, I hope it can help you understand how scientific discoveries were made, and to help differentiate between what is science and what is pseudoscience.
The word science means knowledge in Greek. The scientific method is a systematic way of making knowledge regarding the natural world. But the skill of knowledge-making is not unique to humans. For instance, chimpanzees pass down the knowledge of termite fishing across generations. Crows can understand water displacement principles and use stones to extract water from deep bottles. But there is no doubt, humans have developed the most sophisticated methods for knowledge-making that far beyond any other known life forms. From the language we use, such as formal names for the elements, chemical compounds, and even mathematics, to measuring instruments, some as simple as rulers and clocks, others are extraordinarily complex, like the Hubble Space Telescope orbiting the earth, we have come a long way since the dawn of humanity, and we have learned wonders about the nature of knowledge itself.
Before physics, there is natural philosophy, before astronomy, there is astrology, before chemistry, there is alchemy. We often think what differentiates science from pseudoscience is the subjects of inquiry. But that is actually not the case. Natural philosophy and physics both study the everyday phenomenons, astrology and astronomy both study the celestial bodies, alchemy and chemistry both study the properties of and interactions between materials and matters. The true difference between science and pseudoscience, (at least as Karl Popper has defined), lies within the methodology. The scientific method I referred to, throughout the series is the method that is used in modern physics. In other fields, such as medicine, psychology, and sociology, the methodologies adopted are not exactly the same. I will do my best to present arguments and evidence, as a scientist myself, to demonstrate why we believe that the scientific method is the only way known to human that leads to the objective truth of nature.
Isaac Newton was often referred to as the father of modern science. In his book Mathematical Principles of Natural Philosophy, Newton has not only introduced calculus to this world (the subject that many college students dreaded) but also proposed a unified theory of motion that demonstrated the existence of a universal law that not only governs the celestial bodies also everything else we interact with, in our daily life. The most revolutionary part of his theory, besides the derivation of how the planets revolve around the sun, is the idea of universal laws. There exist immutable laws that can describe everything in this world, and nothing is exempt from them. With those laws, we can not only predict the future but even control and shape it to our own end.
But what is not well know is that besides his contribution of founding physics, Isaac Newton was also an active alchemist. According to his writings, he has actively experimented and sought the philosopher’s stone that he believed to be able to convert metals to gold and also the essential ingredient in creating the Elixir of Life. Newton devoted much of his life in search of the fabled stone.
The scientific method has been evolving throughout history, and the methods we use in different fields will keep evolving with our understanding of the subjects and knowledge itself. Even the greatest minds in history such as Newton can and have fallen into believing pseudosciences such as alchemy and astrology. It is dangerous to presume that any of us somehow would be exempt from those mistakes. Therefore I think it is important, whether as a student, a researcher, or just an enthusiast who occasionally reads scientific news, to think beyond the truth as simply what we have been taught and told, to not only understand what the scientific method is, but also contemplate why we think it leads to objective truth, or even, if it will lead to the truth at all? And finally to be able come to conclusions in our own terms based on solid logic and evidence. I am glad that you are here to take the journey with me.
The first facet of the scientific method I will discuss is observability. Let’s start with a thought experiment:
Consider that a coin was given to us. We flip it again and again, and every time we perform this test, the coin always gives about 70% heads and 30% tails.
Given this observation, we form two different conclusions.
Conclusion one: there is a manufacturing defect on the coin that is influencing the chances of the heads and tails.
Or conclusion two: The coin is fair. But there is a supernatural force that is interfering with our experiments.
From our modern perspective, it is easy to argue that conclusion one is more scientific than conclusion two because we think manufacturing defects are simply more likely than supernatural forces. But actually, neither of those conclusions drawn is scientific, because neither conclusion was directly observed in the experiments. The only thing we have witnessed and measured through this coin toss test is that the coin produces 70% heads and 30% tails, repeatedly. And nothing more. We observed neither the manufacturing quality of the coin, nor any supernatural forces.
Science is empirical. We can only make statements about what we have observed. As for things that we have not measured, it is not in the job of science to fill in the blanks with theories to guess why it has happened. The observability of the scientific method means that, in science, we can and should only make claims about what we observe and measure. Interpret or infer from the data is not part of the scientific method. The question of what can be observed is the very foundation of the uncertainty principle in quantum mechanics first described by Heisenberg.
So imagine there is a basketball lying on the field. When we observe the basketball, what we actually see, is the light that bounces off the ball. We know that photons, the light particles, carry momentum, therefore, when they bouncing off the basketball, they also exert forces on the ball. But because the basketball is so massive in comparison to photons, the effects of the forces exerted on the basketball by the light is not enough to move the ball. But if we shrink the basketball, smaller and smaller until we can no longer ignore the forces exerted by the photon. Then when we bounce the photon off the basketball, we can see where the ball was at the moment of the impact, but we have also inevitably moved the ball by hitting it with the photon. This is the core idea of the uncertainty principle.
For every physical object, especially quantum particles, to measure them, we have to interact with them using other objects. In the example above, that object of measurement is a photon. This interaction will inevitably change the current state of the object that we are trying to measure. To put it in the context of measuring electrons, for example, to acquire a more accurate reading of the position of the electron, we will need to use photons with higher energy levels, in turn, it will change more of the electron’s velocity during the impact. Therefore, there is a trade-off between the certainty of measuring the electron’s position and velocity. The more accurate measurement of the position will lead to more uncertainty in the velocity, and vice versa. Hence, the uncertainty principle.
You may have heard of Schrödinger’s cat before. It is probably the most famous allegory of the uncertainty principle that is featured in many shows from the Big Bangs theory, to numerous Youtube videos. Schrödinger’s cat aims to show an analog example to the double-slit experiment. A simplified version of the double-slit experiment can be described as follows: consider a light source that is capable of emitting a single photon at a time. In front of the light sources, we place a dark sheet with two slits on it, 1 and 2. The distance between the two slits is relatively small. Behind the sheet, we have a clear screen that can show us where the photon lands. If we block one slit, we can see that the photon will hit the screen at a position that is behind the other open slit. But, if both slits are open, the pattern on the screen shown an interference pattern as if the single-photon has passed both slits at the same time. This interference pattern consists of bright and dark strips that is completely different from the simply adding the results from the two single slit tests. Because the observed pattern on the screen seems to suggest that the photon has passed through both slits at the same time. The Copenhagen interpretation stated that we can, therefore, consider the photon to be in a superposition of passing through slit 1 and slit 2 at the same moment. Schrödinger’s cat is a thought experiment demonstrating this interpretation on a larger scale. Now if we consider a cat closed in a box, there is a poison releasing system, that if the photon passes through slit 1, it will release the poison and kill the cat, and if the photon passes through slit 2, it will not. Before we observed which slit the photon has passed through, the cat is then analogically can be considered as in a superposition of dead and alive at the same time.
Even though Schrödinger’s cat has become a paradigm example for pop science to explain how quantum mechanics works to the general public. It was originally proposed as a criticism of the superposition interpretation. To quote Schrödinger: “One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along with the following device, … the entire system would express this by having in it the living and dead cat (pardon the expression) mixed or smeared out in equal parts.” Schrödinger was critical about the existence of the living-dead cat. And Einstein agreed.
Einstein later wrote to Schrödinger: “You are the only contemporary physicist, …, who sees that one cannot get around the assumption of reality if only one is honest. Most of them simply do not see what sort of risky game they are playing with reality—reality as something independent of what is experimentally established… Nobody really doubts that the presence or absence of the cat is something independent of the act of observation.”
Indeed, simply because we have not seen the cat in the box, does not mean the cat is both dead and alive at the same time, and similarly, simply because we have not seen the photon crossing the slits, does not mean it pass through both slits at the same time. Reality does not depend on our observation. It is risky to make guesses about what we have not seen because if our guess was wrong, how can we know?
Then if the photon did not just pass through slit 1, or just slit 2, or both at the same time, how did the photon get to the screen? The simplest answer is, we do not know, because we can not see it. As Einstein has pointed out, it is a dangerous game to make statements about unobserved reality or to interpret data and extrapolate information that is beyond what we have measured. When we make a prediction and it contradicts our observation, we know our prediction is wrong. But when we interpret data and fill in the blanks about the cause of what we have seen. How can we know our interpretation is correct? Yes, we can have many different guesses, some might make more sense to us than others. For example, manufacturing defects might make more sense to us than supernatural forces. But by the end of the day, what makes sense to us is subjective to each person’s experience and beliefs. There is no objective way to tell which guess is better than the others. The results of the unfair coin toss maybe because of the defect of the coin, it may be because of a magnet hiding somewhere we don’t see, or it could be some other reasons that we don’t know about. We can say which explanation seems more likely to us, but we don’t have enough information to decide which one is true. Therefore, we need to differentiate what we have observed from our guesses about what has happened. When doing science, for the sake of objectivity, we should only deal with observations. We do not see the light transmitting through the air, or which slit it passes through, only where the photon has measured on the screen. Therefore, we can not, and should not make any claim about how it passes through the two slits. It is not in the business of science to make speculations about what might have happened or what would have happened. For this reason, medical diagnostics are generally not considered as an exact science. Doctors measure blood pressures, order lab works, MRI scans, and then use those measurements to theorized the underlying cause of the patients symptoms, while the cause itself is often not directly observable or measurable. But those diagnoses do usually rely on scientific discoveries that are based on direct observation.
Take Diabetes diagnoses as an example. Type 1 Diabetes diagnoses come with the conclusion that the patient’s pancreas produces little or no insulin. But the doctor usually only measure the blood sugar level and blood insulin level after fasting. We don’t directly observe how insulin was produced by the pancreas. But given that the pancreas is the only organ we know that produces insulin, it is natural to conclude that the pancreas is failing when the blood insulin level is low. Of course, the low insulin level could also because it is consumed by the body to process sugar. That’s why fasting is required before insulin tests. How do we know the pancreas produces insulin in the first place? In 1889, two physiologists Oskar Minkowski and Josef von Mering removed the pancreas from a couple of lab dogs. Later, they have observed symptoms of diabetes on those dogs. Therefore they suggested that the pancreas plays a crucial role in regulating blood sugar, which ultimately leads to the discovery of insulin. In 1916, Nicolae Paulescu developed an aqueous pancreatic extract that has successfully been used to normalize the blood sugar level of diabetic dogs. Unfortunately, this pancreatic extract was not pure enough and can cause severe allergic reactions when used on humans. In 1922, 5 researchers from the University of Toronto, Banting, Best, Collip, Macleod, and FitzGerald, created the first purified pancreatic extract, also known as insulin, sold the patent for one dollar symbolically, so that it can only be used to benefit the patients, not any companies that might hold the monopoly over this life-saving drug. But unfortunately, since the American companies that obtained the patent for insulin, the three companies were able to extend its patent and hold an oligopoly over the insulin market in the U.S. and the price of insulin in the U.S. is about 10 times the price in Canada, where it was originally discovered almost a hundred years ago.
Sometimes the experiments produce results that agree with our theory, and sometimes they do not. When we conduct an experiment and do not get the result we expect based on our hypothesis, it is easy to speculate that something has gone wrong with the experimental setup. Modern experiments, from chemical reactions, to medical tests, often require complex procedures, they can, and are often subject to mistakes. When we get a result that contradicts our hypothesis, it could be due to a mistake when conducting our experiment or, it could also just that the hypothesis is wrong. As we can not make any claims about whether the cat is alive or dead before we opened the box, we can not say if the mismatch between the result and our prediction is due to the experimental error or just a false hypothesis. An honest scientist would report all of the experimental data, both positive and negative, and make no attempts in trying to explain away the negative results.
I have talked about observation and measurement for a while now. But what constitutes a valid observation or measurement?
Anyone who has ever used a ruler did measurement before. There are two types of measurement, numerical and categorical. Numerical measurement, like what we get from using a ruler, provides a numerical value as the result of the measurement, while categorical measurement, provides a category. For example, when we fill in a survey about our nationality, ethnicity, or school graduated, those are all categorical measurements.
But simply because measurement has created a number or category does not mean that it is good enough for science. Measurements also need to be standardized and consistent. For example, when we measure the length of a table, we would expect the same number no matter whether it is done by a human or a machine. In our everyday life, we are so accustomed to machines making measurements for us, we hardly even think about it anymore. We no longer measure time base on the angle of the Sun in relationship with the horizon, but simply read it off our cell phone screens. When we drive cars, our car measures the number of miles we drive and tells us how fast we are driving. The reason that we trust those machines to perform the measurement for us is that we know they are consistent. Every day at the same time, every phone will give us the same hour and minute given the same time zone. When we drive on the high way, cars can maintain their distance when driving at the same speed limit, because the readings of cars’ speedometers are consistent and standardized across all cars. When we measure the same thing twice, we expect the same results.
When we measuring length and time, everyone agrees what a meter or a second is, as it is defined by an international standard. But when we talk about other types of measurement, such as the unemployment rate, things get murkier. According to the U.S Bureau of Labor Statistics, there are 6 different standards for calculating the unemployment rate, U-1 to U-6. For example, the U-3 standard considers anyone who has been working at least one paid hour in the past week as employed, while the U-6 standard counts anyone who was not given enough work hours as unemployed. So using the U-6 standard for calculating the unemployment rate will naturally give a higher number than the U-3 standard. Unfortunately, many unemployment rate reports on the news often omit which standard the rate is measured under. When we comparing numbers from news or experiments, whether it is the unemployment rate of different times or mortality rates of different diseases, we should first be sure that the numbers are measured using the same standard. Just as we can not compare 1 meter with 2 seconds, or the U-3 unemployment rate to U-6, we can not make any meaningful comparisons when the statistics have different units. A 20% unemployment rate in the U-6 standard does not mean 20% of Americans are out of a job. While a 5% unemployment rate in the U-3 standard does not mean only 5% of people do not have fulltime jobs. Before we try to draw conclusions about those numbers, we should first try to understand how are those numbers are defined and measured.
Here is another example. Many sociological and psychological studies are interested in human happiness. But how can we measure human happiness, that is inherently subjective and invisible? Instead of directly measuring people’s happiness, what the researchers usually measure, is what people say about how happy they are. Those are called proxy variables. The proxy variables can give us insights into things that may not have objective definitions. But we should not mistake proxy variables with the original variable of interest. What people say how happy they are, is not the same as how happy they really are. Similar proxy variables are also used for personality tests and intelligence tests. We do not directly measure human personality or human intelligence, but how people respond to hypothetical scenarios and how they perform in a set of human-designed tests. But does objective human personality or intelligence measurement truly exists? Or maybe personality or intelligence are just human constructs that we made up to make sense of the world, that do not reflect the reality. If so, then isn’t it ludicrous to want to categorize all humans into a small set of personalities, or to compare human intelligence by reducing such a complex concept into just a couple of numbers?
We have so far defined what are good measurements, but how do we deal with measurement errors? When we measure the length of an object using a ruler, we might get slightly different numbers just because we align the ruler a bit differently each time. Furthermore, as stated by the uncertainty principle, there is a fundamental limit of how accurate we can measure both the position and momentum of a particle. But if there is inherent uncertainty, how can we repeat the same experiment and always expect the same results? Well, it depends on what are we measuring. When dealing with probability, a consistent measurement does not concern with the behavior of a single case, but the distribution of the aggregate. Come back to our coin toss example, then we toss a coin 8 times in a row, we are most likely to get different sequences of heads and tails with every set of 8, which means that measurement of the heads and tails sequence is not consistent. What is consistent though is a measurement of the probability. For that, when we toss the coin enough time we can get a very close estimate of the chance of heads. Similarly in quantum mechanics, when we measure the position of a photon in the double-slit experiment, what interests us was not the position of each photon, but the position distribution of many many photons.
Let us take a deeper look into how to measure probabilities. I claimed that the coin had a 70% chance of getting heads and a 30% chance of getting tails. How can we confirm it with measurements? If we flip the coin once and we get a head, does it mean that the coin has a 100% chance of getting heads, and my claim was wrong? Of course not. We can’t make a probabilistic claim with a sample size of only one. But, what about two? Now we might get one head and one tail, that’s 50% and 50%. Does it mean that the coin is fair? Again, no. I will skip the math, but the chance for a 70/30 coin to get one head and one tail in two tosses is about 42%. Also, the only probability we can calculate, with only two tosses is 0%, 50%, 100%. We need more data, a lot more data. When we flip the coin 10 times, we will have a 27% chance to get 7 heads and 3 tails. It is still not very high. How about 100 times? We will have an 8.7% chance to get exactly 70 heads and 30 tails. Wait a moment, the probability has just dropped! It may seem paradoxical at first. But let’s consider the following scenario: two persons measure the length of a short bridge. If we ask them to measure up to the accuracy of a meter (about three feet), it is a lot likely that their measurement gonna much up exactly than if we ask them to measure up to a centimeter (less then half an inch). That is, the more accurate we want, the less likely the results gonna match exactly. But if we agree to accept some error margin before the measurement, then it is more likely that the measurement can be reproduced. So we made an agreement, that instead of requiring the measurement to be exact 70%. You would accept that my claim is true if the measurement falls between 69% and 71%. So, what is the chance of getting 69 to 71 heads if we flip the coin 100 times? It turns out only about 26%. That is there is a major chance your measurement will contradict my claim about the coin with 100 flips, even if I was telling the truth! How about 1000 times? The chance of getting 690 to 710 heads is about 53%. Better, but still not great. What if we want to be 99% sure that I wasn’t lying about the coin? Well, it would require about 15,000 flips. That is a lot of coin flips. It does put drug trials with hundreds of patients into perspective, doesn’t it?
So does it mean that we just need to perform millions of tests and we can always get an accurate measurement of the probability? Not quite, there are certain events that are so unlikely that even millions of trials may not be enough. Here is an example, let’s say that I bought a lottery ticket and hit the jackpot. It is unlikely, but every day there are hundreds of millions of people buying lottery, and thousands and thousands of winners were drawn. It is not impossible, but merely improbable that I hit the jackpot with my first ticket. Now, I am interested in measuring what’s the likelihood of someone to hit a jackpot. So I bought another ticket, I didn’t win this time. Does it mean the win rate of the jackpot is 1 in 2? Again, no. The sample size was too small. So I bought 1,000 more tickets, no jackpot. 1,000,000 more tickets, still no jackpot. Does that mean the chance of winning the jackpot is 1 in a thousand, 1 in a million, or 1 in a billion? Well, we don’t know because we only have one positive sample. If we bought 1 million tickets and win 1 thousand times, we would be quite confident that the chance of winning the lottery is about 1 in a thousand. But when we only have one or a few positive samples, It is impossible to make accurate measurements of the probability of positive results. Therefore in addition to have a large sample size, we also need to have a significant number of positive results to be able to accurately measure the probability of the positive results. But unfortunately, for most scientific research, data collection is expensive. Many research papers were published with relatively small sample sizes and maybe a handful of positive results.
There is a common misconception that the uncertainty in quantum mechanics is the ultimate cause of the uncertainty in the events of our daily life. When we measure the polarization of two identical photons, we can get different results. But if we give someone a drug she had an allergic reaction to, odds are, she will have another allergic reaction. The results of measuring photons are random, but the result of getting an allergic reaction from a drug is usually deterministic. When we say 99% of the time the drug is safe, what we mean is that 1 on a hundred people use it will experience negative effects, not that if anyone who uses the drug 100 times might get one allergic reaction. The majority of the people using it probably will never experience any side effects, while a small number of people would get an allergic reaction every time. We can not be certain when measuring quantum particles because we are limited by using photons as our measuring instrument. But the uncertainty in our daily life is generally due to the limitation of the model we use to predict the future. When a doctor prescribes a medicine to patients why does it work for some patients while can trigger allergic reactions of others? It is because we do not have an accurate physiological model that includes all the necessary factors needed to predict how the patient will react to the drug. It’s like trying to calculate the top acceleration of a car, by only given the type of gasoline and the acceleration of 100 other cars that all have different sizes, shapes, and manufactures, without knowing much about the engine or gears of the car. We can, of course, try to build a more accurate physiological model that aims to better capture humans’ reactions to drugs. Many personalized medicine researchers aim to tackle this question, instead of developing drugs that probably gonna work for most people, probably most of the time.
Be able to have consistent measurement is crucial to the Reproducibility of the scientific method, which I will discuss next time along with the relationship between science and mathematics. As we can see that error exists in all measurements. When we repeat the same experiment and get different results, instead of concluding that results that contradict our hypothesis are caused by experimental error, we should think about how we can redefine our measurement to incorporate the unavoidable uncertainties in future experiments, whether performed by us, other scientists, or even machines. So that anyone who repeats the same experiment can and will get consistent outcomes.
Before I end this episode, I want to leave you with a parable. A man was given a quantum coin. It uses nanotechnology that entangles itself with future events. He was told that by using it, he can predict any result of future football games. But, due to the complexity of the coin, it has some quirks to it. It has to be tossed just the right way to be able to make predictions accurately. But the quirk is difficult to describe or measure, instead, he knew that he has used the coin correctly when the predictions are right. So, he used the coin to predict 20 games, about 10 of which, it predicted correctly, and 10 of which it has not. Because he knew the quirk of the coin, he knows that the 10 mispredictions are due to his manual error, therefore those were outliers. He published a paper on this nanotech coin using the 10 games that he knew he has used it correctly. The peers who reviewed the paper all marveled at the accuracy of the prediction. The paper published, and people were aspired by the technological breakthrough of this coin. But does the coin really work?



