Scientific Method: 2. Reproducibility

In this episode, I will talk about the reproducibility principle of the scientific method. I want to start the discussion by defining the word “Truth”. In the last episode, I stated that the scientific method is the only way known to humans that leads to the objective truth. But if I were to convince you that is the case, I probably should first define what truth is. When we use the word “truth” in our daily conversation, we usually mean an event that is from our direct observation or a conclusion drawn from those observations. Most of us believe that seeing is believing, we trust our memory and senses and often consider them as the most reliable evidence. We also trust our thought processes, after all, that is why we use them to draw conclusions that guide our daily decisions and actions in the first place, even though it is also quite often that we can not describe our exact reasoning, or provide proof for the validity of our rationale.

When we tell our friends about our favorite shows or describe to them in detail how our day went, we are, in a way, translating our sensory experiences into words. This is, in fact, quite similar to use a ruler to tell the length of a table, which is also converting experiences in the physical world into symbols. When we converse with our friends about our day, we are making measurements about the world and conveying those measurements to others, so that they may in part, experience what we have experienced. It may sound radical to consider humans as measuring devices of the physical world, but many philosophers agreed. Immanuel Kant even went so far to claim that human minds contain categories of understanding and are made for examining and understanding the universe. If we were to define truth as directly observed events, or conclusions drawn from direct observations, what we meant is that truth is just our measurement of the world based on our sensory experiences. But do people experience the same events and yet make different factual claims and or come to different conclusions? Well, quite often, we do.

In the last episode I have discussed that not all measurements are made equal, without a clear and agreed definition or standard, humans can, and often do, looking at the same events and information but comes to vastly different or sometimes even completely opposite conclusions. Consider the following question: “Is the emancipation of the slaves good for freedom?” From the western contemporary perspective, the answer to this question may seem to be bluntly clear, that it should be an uncontested yes. But it is more complicated than that. In the dialogue of Gorgias, Socrates asserted: “no one does evil willingly.” Failing to understand this, has been a great empathy barrier for humans. We often see people who do what we consider as stupid or evil as stupid and evil. But people most of the time were simply doing what they believe to be good by their own definition of goodness. I think language was designed for communication, to make one’s thoughts and reasoning transparent to others. To use language effectively, it can help us understand different perspectives and resolve differences without the need for violence. When we decide to not to use language to communicate and understand but to provoke and ridicule, or simply not to communicate at all, it often leads to great divisions and violent outcomes at the end. During the American civil war, it is easy to view those who fought on behalf of the south as racist or against freedom. But if we read the narratives and the correspondence of confederate soldiers, they too were fighting for freedom, a free market without government intervention, the freedom of sovereign, their economical and political independence, and protection from government confiscation of properties. Freedom means different things to different people and sometimes can even take contradictory forms. Before we pass judgment on others, it is important to first try to understand their definition and perspectives, because they may be different from ours. On a side note, The primary crops produced on the plantations were sugar, tobacco, and coffee. Atlantic slavery was the by-product of a free market that is unbridled by regulations to satisfy the insatiable consumer desire that brings human pleasure, but not nourishment.

What freedom means is a complicated question. Most of the concepts we deal with every day often mean different things to different people. One of the common failures in human communication is that we have a hard time recognizing that others may have different definitions from ours when we are using the exact same words. When we examine other’s arguments using our own definition, their arguments may seem absurd and making absolutely no sense. But if we put those arguments back into their original context, using the speaker’s definition, those arguments do often have some legitimacy. Failing to understand this principle of communication can make differences in opinions seemingly unresolvable, in ways that they are not. Unfortunately, throughout history, violence often ensues when humans fail to see the legitimacy of the others’ perspectives.

I want to share here a quote from What We Owe to Each Other by T.M. Scanlon (you may also know this quote the show the Good Place):

The reasons we have to treat others only in ways that could be justified to them underlie the central core of morality and are presupposed by all the most important forms of human relationship. These reasons require us to strive to find terms of justification that others could not reasonably reject. But we are not in a position to say, once and for all, what these terms should be. Working out the terms of moral justification is an unending task.

In a way, human history is the struggle of working out the terms of moral justification with different beliefs and ideologies. What does mean to be just, what does it mean to be free? Those terms have been questioned and debated throughout history, and ultimately settled on the battlefield.

When we disagree on what is true, sometimes, it is due to the difference in definitions, while other times, it is just our notoriously unreliable perceptions and memories. I want to share two studies that I think have demonstrated this quite well. In the car crash study conducted by Psychologists Loftus and Palmer, participants were shown 7 different film recordings of car accidents. After viewing the film, they were asked to estimate the speed of the impact. But there was a catch, all participants viewed the same films and were asked the almost same question besides a single word. Some were asked the speed at which the car was “smashed”, others were asked the speed at which the car was “hit”. The group that was asked with the word “smashed” estimated the speed of impact as around 42 miles/hour while the group that was asked with the word “hit” estimated the speed of impact as around 31 miles/hour. Remember, those were identical footages both groups saw. Afterward, the researchers asked the participants to recall if they have seen any broken glasses from the films a week later. About 30% of participants who were asked with the word “smashed” said that they remember seeing broken glasses while only 15% of those who were asked with the word “hit” did. There were no broken glasses in any of the films.

Are we better at remembering details of our own lives than the experiments we participate in? After the tragedy of 9/11 in 2001, scientists have conducted a national survey of over 3000 people from New York City, Washington, D.C., Boston, and four other cities in Connecticut, Missouri, and California. The participants were asked how they felt, what they were doing, and who they were with when they heard about the news. Many have given very detailed descriptions and said that it was a day and time they would never forget. The researchers followed up with a couple of other surveys throughout the years. They have found that after a year what people remembered was only about 63 percent consistent comparing to their original testimony after 9/11. After three years the consistency was 57 percent. What’s even more perplexing, is that the participants were firmly convinced that their testimony was absolutely accurate, they couldn’t believe that their memory was wrong, even when they were given their original statement and shown the contradiction. They even start to question the accuracy of their original statements, because they were so convinced that what they remembered at the moment must be true. Our memory is not so much as an infallible recording of what has happened, but a reflection that been constantly shaped and changed by our current experience and thoughts. Can we solely rely on our own memory and experience to decide what the truth is? I am not sure we can.

It is not only the memories and judgment of individuals that can be flawed but so the memories and judgment of human enterprise as a collective. Throughout most of human history, humans believed that the Sun revolved around the Earth, and would consider those who think otherwise as heretical and insane. In pre-enlightenment Europe, the divine right of kings is a commonly accepted belief, that our place in the society is ordained by God. This universe is among the best of all possible worlds. Any attempt to change the existing social order would be defiance to God’s will. Of course, from our modern perspective, it is a blunt attempt for the powerful to use divinity to justify their own socio-economical position and maintaining the status quo. But it is wildly held belief for over a millennium nonetheless, by not only the kings and lords but also by those whom they have ruled over. Many Shakespearean plays are themed around the divined right of kings from Hamlet to Macbeth. Good thing that contemporary humans would never believe anything false with the help of the internet to settle all arguments, right?

I stated before that the scientific method was the best way known to humanity to find the objective truth. What I meant is that when our future observations always match the prediction made based on our theories, then we may call the theory true. One of the key factors that differentiate scientific theories from pseudoscientific theories is if the theory can be used for prediction or interpretation. Science predicts the future, pseudoscience interprets the past. Sigmund Freud was arguably one of the most famous people of the 20th century. His psychoanalysis theory has shaped many popular ideas in the past century and it is still shaping not just what we think but how we reason as a society. He would listen to his patient’s experience and history and come up with an analysis that explains his patients feeling and actions. If an aggressive patient hasn’t been hugged much when he was a child, Freud would explain his aggression as an expression of lack of intimacy. But if the patient had a good childhood and with much attention and love, Freud would explain his aggression as an expression of seeking attention. As long as the Freudian explanations remain some linguistic connections to his patient’s experience, no one can show those after effect analysis to be true or false logically, even though some may found his analyses convincing, others may not. Then the validity of the theory is completely up to the subjective perspective of the beholder. Similar to Freudian psychoanalysis, racial theory and social Darwinism, were also developed during this time period in aim to explain the existing economical and social inequalities in Western societies. Even after a hundred years, those pseudoscientific theories still deeply shape how we view others in our society and we are still living their shadows. It is not to say that it is always wrong to interpret and read the meanings of the events of the past. Sometimes, those interpretations can provide important perspectives, but we can not know if our interpretations are correct. We often failed to see that with limited information, there can be many different legitimate sensible interpretations of the same events. Only predictions of the future based on our observation can be validated. This is how Albert Einstein introduced general relativity to the world.

The discovery of general relativity was a fascinating story. I want to tell you from the beginning. Most of us have learned in school that light is extremely fast, but still travels at a finite speed. But it is not a simple conclusion to come to, because for almost all our daily interactions we can see events about at the same time as they unfold at the distance. When we see something happens with our own eyes, is the moment we consider that event has just transpired. We don’t look at the Sun and think, “Oh, that was the Sun 8 minutes and 19 seconds ago.” How can we know what is in the sky was the Sun 8 minutes ago, 8 seconds ago, or right now? From 1668 to 1678, Danish astronomer Ole Rømer, who was incidentally the inventor of the modern thermometer, made about sixty observations of the eclipses of Io, the first moon of Jupiter. Io orbits around Jupiter about every 42.5 hours. Galileo was the first human discovered Io in the early 1600s. Due to the enormous size of the gas giant, for every cycle Io orbits Jupiter, the first moon disappears into then emerges from the shadow of the planet. Using this disappearance and emergence, astronomers can calculate the orbital cycle of Io. What Rømer has found during his 60 observations was that the period of the orbital cycle seems to be changing throughout the years. Specifically, when the earth was moving closer to Jupiter, the time between eclipses of Io seemed to be shortened, but when the earth was moving away from Jupiter, the period seemed to be longer. What Rømer speculated was that the period at which Io orbited Jupiter had not changed, what has changed is the amount of time between when the eclipse happened, and when he saw it. That is, it takes time for him to see events that happened far away. Based on the changes in the observed Io orbital periods, he estimated the speed of light to be about 220,000 km/sec only 26% lower than our modern measurement. By the way, the telescope Rømer used for his observation was extremely low in clarity and magnification and was far worse than most of the hobbyist telescopes we can get in a supermarket. Anyone with 50 bucks in their pocket, in theory, can verify and reproduce his result. Later, Newton accepted Rømer’s idea and did his own calculation of the speed of light and estimated it takes about 7 to 8 minutes for the light to travel from the Sun to the Earth, in comparison, the modern-day measurement is 8 minutes 19 seconds.

Since its publication, Newtonian mechanics has been the accepted theory of everything for over 200 years. Based on Newton’s theory, if I am running on a moving train when someone else off the train measures how fast I am moving, she should get the sum of the speed of the train, and the speed of me running on the train as the measurement. Similarly, if a light is emitted on a moving train, the speed of light measured by someone on the ground should also get a similar accumulation effect of the moving train and moving light. In the mid 19th century, many scientists tried to measure the change of the speed of light on a moving frame of reference, but all have failed due to high measurement errors. The speed of light is so fast, there was simply no accurate enough clock or fast enough train that allowed scientists at that time to measure the difference of speed of light in different frames of reference within the measurement error margin. In 1887 Albert Michelson and Edward Morley used a device called Michelson interferometer and hoped to show that the speed of light changes based on the angle relative to the movement of the Earth around the Sun. But they have failed to confirm their hypothesis, that is all measurement of the speed of light was the same regardless of the frames of reference. At first, they thought it was because their device was not accurate enough, but they verified their device and repeated their experiments and got the same results. They concluded that their result was not due to an experimental error. When Michelson and Morley first published their results, it attracted much criticism as their experimental result directly contradicted the wildly accepted truth. Newtonian mechanics to many people was not just a theory that aims to model the behavior of the universe, but the unbreakable law that all things in the universe must obey. Even though many people didn’t accept Michelson and Morley’s result, Albert Einstein did. The constant measurement of the speed of light in all frames of reference became the cornerstone of the theory of relativity.

The Michelson and Morley experiment was arguably the most important failed experiment in history. The experiment was wildly considered as a failure because it did not confirm the well established Newtonian theory. But simply because a theory is well established or accepted, does not necessarily mean it is true. Negative results often get a bad reputation in research, as they are difficult to publish. But they can have tremendous epistemological values. If we carefully scrutinized our experiment, and we repeatably get the same negative results, we should not select data to fit existing theories, but publish all the results and question the hypothesis, and document the experiment as detailed as possible to allow other researchers to be able to repeat our experiment and also get the same result therefore to verify the falsity of the theory themselves.

Based on the finding of Michelson and Morley, Einstein postulated that time must dilate when the frame of reference is moving to compensate for the constant speed of light, he developed the theory of relativity. After the publication of the general theory of relativity in 1915, many people were skeptical regarding Einstein’s theory. After all, the theory of relativity does not seem to reflect our everyday experience, we don’t feel time slowing down when we run really fast, or see the space bending when we are around heavy objects. To demonstrated the validity of his theory, Einstein waited patiently for a solar Eclipse. In 1919, British astronomers Frank Watson Dyson and Arthur Stanley Eddington carried out two expeditions, one to the West African island of Príncipe, and the other to the Brazilian town of Sobral to make observations the total eclipse. As the theory of general relativity postulates that mass distorts space while Newtonian mechanics does not, general relativity predicts the deflections or bending of lights near the large mass bodies which will displace the position of stars in the sky next to the Sun during the eclipse, that Newtonian mechanics can not account for. During the Eclipse, the two expedition teams measured the positions of the stars near the Sun. Afterward, Dyson and Eddington compared the measurements to their own calculation and confirmed that the measurements agree with the prediction made by general relativity, not Newtonian mechanics. This experiment has made international news, and Einstein became a superstar overnight.

The scientific method does not give us a way to convince others about how right we are. But it allows others to verify our claims. With the help of a concise language, the scientific method also enables us to make our own knowledge and understanding transferable to others. It wasn’t Einstein who demonstrated the correctness of the general relativity, but, Dyson and Eddington, followed the theory Einstein proposed, did their own calculation, conducted their own experiments, and confirmed that their observation matched with that prediction. If the observation had contradicted the prediction made by general relativity during the solar eclipse of 1919, that would have been the end of the theory. In comparison, pseudoscientific theories such as Freudian analyses can always reinterpret the result to remain some connection to the original theory as long as the connection sounds convincing. The motto of the scientific method is “Nullius in Verba”, meaning “on no one’s word”. It truly captures the heart of the scientific inquiry, that truth is not declared by anyone but should be able to be verified by everyone. A theory can not be called scientific until someone else has reproduced the result without knowing in the first place what the originally published data was, so there is no confirmation bias. To allow anyone to verify our theory, the scientific method demands reproducibility. The measurements we make need to be consistent. The logic and reasoning we adopt should be standardized. Anyone who knows algebra can use general relativity to make their own predictions, and everyone should calculate the same results. Ultimately truth should not be settled based on which argument we found convincing, but based on an objective standard: If an objective agreed prediction matches an objectively agreed observation. In the last episode, I have discussed consistent measurement, which is the first key to reproducible experiments. In the next episode, I will dive deeper into the 2nd key: standardized logic, reasoning, and mathematics.

In recent years, there is a growing concern with a reproducibility crisis in the scientific field, as very few research works can be or have been reproduced and verified by other scientists. Some are caused by the lack of standardized and consistent measurements, but many are due to that different scientists doing the same analyses may use different reasonings. In 2018, Doctor Brian Nosek and his team published the paper: “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results“. During this research, 29 teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. All of the teams were given the same data, that is they all have to agree with the measurement. But 20 teams have found that there is a statistically significant connection between red flags and skin color while 9 teams didn’t. How can math disagree with itself? During the analysis, because there is no mathematical definition of the question “whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players”, 29 teams used 21 different metrics to quantify this given question, and therefore, ended up with different conclusions. Which analysis is correct? Well, it depends on which definition we choose. We can’t agree on the conclusions unless we first agree on the definition.

In the next episode, I will talk about logic, the standardized way of reasoning, its counterpart, fallacy, and Karl Popper’s final insight into the Scientific Method, Falsifiability.

Leave a comment