We all know exactly what “causality” means. “Causality” is when it rains, the ground will be wet, it is when I push my chair, my chair will move, it is when I drink too much, I will have a headache the next day. Plato postulated that all concepts have some essential properties. A thing is to be called belong to a concept if it satisfies all of the concept’s essential properties. For example, a shape is call a triangle, if it satisfies: 1. the shape is consistent of three line segments. 2. The three line segments connect at the end points form a closed shape. Everything that is a triangle must satisfies those properties, while everything that is not a triangle must not satisfy both properties. To use mathematical terms, the essence of a concept is the necessary and sufficient condition for the concept, otherwise known as the definition.
Definition, Necessary and Sufficient Conditions
For the readers who are unfamiliar with necessary and sufficient conditions, a necessary condition for something is a condition must be present for that thing to occur. For example, a necessary condition for someone to be human, is to have human DNA. But having human DNA is not sufficient to be called a human. My blood cells have human DNA (at least I believe so), but they are certainly not humans. A sufficient condition for something is a condition that is enough to guarantee the presence of the thing. For example, thanks for 14th amendment, born in the United States is a sufficient condition for being a U.S. citizen. But many who are born outside the United States can also become a U.S. citizen later in life.
As an essentialist, Plato believed that essence preceded existence. That is, all linguistic concepts had their essence or definition first, then objects that satisfy those essences come to be. For a lot of mathematical concepts, this seems to be the case. In nature, there doesn’t seems to be an example of a perfect triangle, but it is also hard to argue that humans had invented triangle when the first time we connected three lines segments together. Gottlob Frege has also asked a similar question in The Foundations of Arithmetic: Did the concept of numbers exist, before the first time human has started counting? But I will come back to Frege later.
Let us assume Plato’s essentialist position first. “The rain caused the ground to be wet”, “I have caused the chair to move”, and “The drink has caused my headache” are examples of “causality“. But what is the definition or essence of “causality“? If we look at the definition provided by a dictionary [https://www.merriam-webster.com/dictionary/cause]:
Cause: sufficient reason
So, according to the dictionary, A is the cause of B, if A is the sufficient condition for B. In boolean expression: (not A) or B is true. If the reader is not familiar with the boolean expression, the expression means that A can not be true (happened) while B is false (did not happen). If A is the sufficient condition for B, means that if A happens, it will “cause” B to happen. So we can not have the case that A happens while B does not. But the boolean expression shows something interesting. Because we have observed B, as we have observed the wet ground, moving chair, and have experienced the headache, that is, we know that B is true, then (not A) or B is always true, no matter what A is. Here is something really important:
After effect analysis, that is the interpretation of existing data in aim of explaining the relationship between observed events, is always logically true (or as Karl Popper calls it, irrefutable), regardless of how much sense it makes to us. (For elaboration on this, I recommend Karl Poppers’ Conjecture and Refutation on the difference between Pseudoscience and Science)
I know different people have different opinions on whether validity of a theory should be based on logic or how much sense they makes. But as a scientist and a mathematician, to me personally, the validity is decided by logic rather than intuition or senses.
Correlation and Causation
But I think the reader would agree with me that there is something different, when we saw a wet ground to say that “The rain has caused the ground to be wet” from saying “My dream last night has caused the ground to be wet”, even though both are logically true statement given that we have observed the wet ground. The difference, I argue, is the result of a thought experiment that we have conducted unconsciously. When we have observed the wet ground, we imagined that if it didn’t rain, the ground won’t not have been wet. But if I have dreamed of a different dream, the ground would still be wet. That is, even though we have only observed that B (wet ground) is true, our mind also try to construct a scenario that A (rain/dream) is false if B has not been observed. “The ground would not be wet if there is no rain”, “The chair would not have moved if I not pushed it”, “I would not have had the headache if I have not drunk last night”. But we should be very careful when we make those arguments, as they are based on our imagined data, not observation itself, so whatever conclusions we draw, are just conjectures. But of course, if I had a time machine, I can go back to the last night party, and have all the same interactions besides the drinks, I can see if I still get the headache the next day. In labs, we do this all the time, not with a time machine, but controlled variables, by varying variables one by one, repeating the experiment to see if we achieve a different result. If so, we draw the conclusion that the current varying variable is the “cause” of the original result. Reader probably has heard of the phrase “correlation is not causation“. In many cases, we only observe that events A and B occurring together, i.e. A and B are correlated. To be able to make any claims about “causation“, we will need to constructed variable controlled experiments to see that by removing A while keeping all other factors as constant, to see if B still occurs. Without controlled experiments, we should be cautious when trying to argue causality from correlation using thought experiments because they are not based on observations of controlled experiments, but extrapolation of past experiences. It is extremely rare that we have experiences that the only different factors are the cause and effect that we are trying to draw conclusions about. Often, there are far more variables in play than two. As an example, when comparing the COVID19 to the Spanish Flu, it is hard to isolate their effects. In 1918, majority of Europe was ravaged by war and many many people were suffering from starvation. Working conditions were poor and modern medicine was still in its infancy. Even to compare the data of H1N1, and other influenza-like illness nowadays to COVID19, it is still difficult to make any meaningful conclusions because the criteria for testing are drastically different, treatments are different, and how the society has responded to them is also very different.
Here is another example, let us take a look at education and crime rate in the United States. From the statistics we are able to see a correlation between lower education and higher crime rate between African American and Hispanics comparing to the white people. It is easy to conclude that it is the different genes that have caused the gaps. We can constructed a thought experiment stating that if they were more genetically similar to the white, they would be smarter and get higher education and be a better citizen. It has been an argument that dominated the mainstream for hundreds of years. Even nowadays, it still appeals to many people. Besides that this thought experiment is completely impossible to disprove, we can’t just change someone’s race. Oh but wait, we can. 100 years ago, only Protestant British descendants were considered white. Italians and Irish were not. During the great migration in the early 20th century, those immigrants worked the jobs of low wages in the manufactures (lower than British decedents, though still higher than African Americans). But nowadays, Italian and Irish are considered as white, and the income gap between the them and the British decedents has became significantly smaller.(https://en.wikipedia.org/wiki/List_of_ethnic_groups_in_the_United_States_by_household_income). It is not that something genetic has changed for Italian and Irish people that has made them white. But that the categorization of “white” has changed. This change of categorization has correlated with the reduction of education and income gaps between aforementioned groups and the original white race. Furthermore, there are numerous studies being conducted since the 50s, showing that when controlled for income, the gap of education level and crime rate become significantly smaller between African Americans and White people. In many places, it is hardly statistically significant. So, if we have to assign a single cause, it would suggest that it is the systematic impoverishment of the African Americans in the past hundreds of years the cause of their current social and economical situation. But, of course, anything is rarely caused by a single factor. The racial superiority argument, is not just false, but a complete hogwash, no matter how much sense it makes to some.
We all should be extremely careful, not to mistake what makes sense to what is true.
On a tangent, interestingly, for someone who believes in determinism or fate, causality doesn’t exist because there is no alternative reality. Oedipus’s fate is determined, therefore no actions of his can change the outcome. So none of his action is the cause of his end. When we look at history, history is what happens to what has already happened. It feels certain and inevitable to us because it has already happened. Therefore, the interpretations of historical causes (as Karl Popper called it, “historicism”), even though provide important perspectives, is a form of after effect analysis and will remain irrefutable conjectures until we can perfect recreate historical situations and events and testing different potential outcomes.
A reader who is familiar with the difference between necessary and sufficient conditions would notice that when we making the claim that “The ground would not be wet if there is no rain” we are not saying that rain is the sufficient condition for the wet ground, but that it is the necessary condition. Using our example of sufficient but not necessary condition before, when someone is born in the United States, it is a sufficient condition for him/her to be a citizen of the United States, a person can certain still be a United State citizen even if he/she was not born in the states. Does that mean born in the United States did not “cause” the person to become a United States citizen? Furthermore, how would we construct a controlled experiment such that a person is born in a different place, but yet every other experience in that person’s life remains the same, same friends, same education, same school? It would be hard, even just to imagine such an experiment. “Causality” is simple on paper when we only considering two isolated events A and B that can be bipartitely defined as either happens or not happens, but in real world, events are deeply interconnected, and most things can not be described by a simple binary variable. But the concept of causality provides us a simple narrative that is easy for our limited mind to make sense of, but the reality can rarely be captured by such a simple model, even though this causal model makes a lot sense to many of us. To quote John Green: “The truth resists simplicity“.
So saying “causality” is both necessary and sufficient conditions does not work for some of the examples that we consider as “causal“, what if we stick with only sufficient conditions? Considering the following scenario: I was drunk at a party last night, woke up with a headache. I ran towards my mom and told her that she has caused me the headache. It may seems to be an outrageous claim. But think about it, my birth and my headache have both happened. It is impossible for me to have an headache if I were never born. One may argue that there is an alternative universe that I was born but I didn’t drink so much in the party from last night then I don’t have the headache in the morning. But I can make similar argument for the drinking. Even if I drink the same amount, but if I have drink enough water after, or taking some pills I wouldn’t have the headache either. That is, drinking too much is neither a necessary nor sufficient condition for my headache, just like my birth. But yet we consider one of them as a cause, the other as not.
For Plato’s postulation of essence, all things that satisfy the essence of a concept belong to the concept, while all things that do not satisfy the essence of a concept do not belong to the concept (Readers may realize that it is very similar to set theory, and there is a deep connection between meaning of language and set theory. Readers who are interested should look into Russell’s paradox). But here we have two examples that based on different definitions, will both satisfy/dissatisfy the essence in the same way, but yet we accept one as a cause but not the other.
The Grounding Problem of Language
20th century philosopher Ludwig Wittgenstein noticed that Plato’s essentialism view for language simply does not work for most linguist concepts. Wittgenstein stated that, instead of essence precede existence, “language is use“. For most linguist concepts (words, phrases), there does not exist a common set of properties that are shared by all the references of the linguist concepts, while no the reference considered outside the linguist concepts have all of the properties. (Frege used the word “reference” for real world example of a word. For the word “car”, my personal vehicle parked downstairs is a reference to the word “car”. For abstract concepts, such as “justice”, a reference is an event that we consider as “just”.) Wittgenstein referred to this connection between the references of a linguist concept, as “a family relationship”, that a given reference of a concepts shares some property with some other reference of the same concept. But there is no commonality across all the references of a given concept. That is, this are no universal and consistent definitions for most of the natural language concepts. Some machine learning algorithms aim to overcome this limitation by modeling linguistic concepts as a statistic model, that when a reference (a picture of an animal for example) is statistically similar enough to some of the training references (pictures of a cat) that we accept to be associated with a concept (cat), machine learning algorithm will conclude that reference belongs the concept (that it is a picture of a cat). Ludwig Wittgenstein calls those references that most of a linguist group accepts to be associated to a concept: paradigm cases. Whether using statistic models to capture the inconsistent and ambiguous flawed reasoning associated with human natural language is brilliant, or any model that is based on inconsistent reasoning is necessarily problematic, is a debate I will leave to the readers.
Ludwig’s theory seems to be supported by how language is learned by humans and how it evolves. When we were kids, we don’t learn language using definitions from the dictionaries, but rather how our linguist group (our family members, peers, teachers, etc.) uses those words. We learn the concept of “red” by pointing at different objects that our other members of the linguist group consider as “red”. Also the meaning of the word changes when we use it in different situations, the word “career” only referred to horse racing until the industrial revolution. In medieval time, the word “girl” referred to a young child regardless of gender. For a more recent example, the word “literally” has also shifted its meaning. It no longer means “exactly as the words suggest”. Take the following sentence as example: “I literally did not have any food this morning.” Without any additional context, the listener of the sentence would have no idea if I had any food or not.
Though some politicians may disagree, I consider that the purpose of language is to facilitate communication. But it is kind of problematic when each of us have our own different collection of references when given a word. This is why when we engage in a conversation about something, especially abstract concepts, we should provide the listeners enough context through asking and answering questions, so that the participators of the conversation would be talking about the same thing, instead of each person basing on their own understanding of the word. It is sometimes called the Socratic method.
I would argue that it is also a good thing that language allows us to discuss concepts such “justice”, “morality”, and “human rights” without the need of a concrete definition. Because only when we have the language, can we communicate and discuss those ideas with others. Because language is adaptive and changeable, we can refine and change the references of those words with our understanding of the universe to remove contradictions. Humans have once believed that “freedom” is “all men are created equal, but some are born in chains”. But now, we no longer accept slavery as part of a free country. Maybe one day, we can truly acquire the essence for concepts like “justice”, “morality”, and “human rights”, that is, as Immanuel Kant desired, univesalizable without contradiction.
So, what is the paradigm cases for “causality“? Well, it has also changed through out history. Before enlightenment, it was wildly accepted that divine or demonic power, fate, witches, can all be the causes on the events in our daily life. In early 20th century, natural selection, psychology, and genetics were popular causes for explaining phenomenons. As for nowadays, psychological causes remain popular. In addition, physiological and sociological causes has also gained a lot of attractions, just to name a few. We learn what “causality” is, similarly by listening to how our linguistic group use it. When I was growing up, I learned that don’t brush my teeth will cause cavities, watching TV for too long will cause near nearsightedness.
Though many events happen seemingly randomly, humans have observed that there are many of events also seemingly correlated. By controlling certain events we can actually sometimes altering the outcome to what we wish for (or at least we believe so). Some of those relationships between events became the paradigm cases for causality. (Reader may notice that by expecting to change event A to change the outcome B, A is not simply just a sufficient condition for B, when we say A has caused B). Even though the world can’t really be described by events with delineated beginnings and ends, most things do not fit our binary model (right/wrong, true/false, black/white), and there are always far more than a handful of factors in play, the word causality has given us the ability to describe those relationships and make predictions and even to tailor the outcomes. The word causality has also allowed us to communicate those learned experiences, to pass knowledge to future generations before the discovery of formal logic, mathematics, and other more complex models. To me, the idea of causality symbolizes humanity’s desire for knowledge and our incredible ability to understand the relationships between observations. But also, causality is a extremely limited and inconsistent model that fails in many cases.
So does “causality” exist? If existence means that we have a logically consistent definition for it, I would have to say no, at least not in the same way that triangle exists. But yet it is also real, at lease to us as it is an integral part of our mental model in attempts to make sense of how the world works.
Epilogue: Science and Causality
“Science” is another word that through history has not been so well defined. Karl Popper in his work Conjecture and Refutation has developed important insight that certain theories (he called “scientific”) are better at predicting the future events than some others (he called “pseudoscientific”). But for my discussion here, I want to broaden the definition of scientific theories to the following:
A scientific theory is a human constructed model for explaining the relationship of events in the past, present, and future in aim of making future predictions and controlling the outcome.
Some scientific fields such as physics and chemistry, rely on mathematics for developing the models (Mathematics is a consistent language that all concepts are defined by necessary and sufficient conditions). While some other fields, such as biology, medicine, social science uses models that are constructed using natural languages (with most concepts that are defined by examples). In a way, our everyday thinking and speaking are our attempts to construct models to describe the relationship of and predicting the observations from our experience, and the relationship between our own subject feeling, motivation, and the world that is outside us. Causality is just one of the models we use to describe how the world works, albeit a very simple one. Gut feeling or intuition is another common model.
I want to leave reader with the following questions:
- Does it matter whether the model is based on formal mathematics or nature language as long as we can make accurate predictions and get the outcome that we want? Or is it even possible to achieve an accurate prediction when the language we use for modeling is inherently ambiguous and inconsistent?
- We know that with a good model, such as quantum mechanics, can predict the future in astonishing accuracy. While someone who is remotely familiar with machine learning or has used Siri before, knows that for a lot of times those models does not do such a good job at predictions. Should we always seek to refine our model when observation contradicts our prediction, or there is at some point, we can call it good enough?
- When we trying to use causality to make sense of the world, are we using it (for the most time) to construct irrefutable explanations of the past (“I had a headache today, because I had to much to drink”), or to predict the future (“If I drink the same amount again, I will have a headache again”), or to control the outcome (“If I don’t want to have a headache tomorrow, I should not drink as much”)?
- In an episode of the Last Air Bender, a fortuneteller foretold one of the villager that he would meet the love of his life while he was wearing a pair of red shoes. The villager has been wearing red shoes ever since. The fortune probably will come true. But because the prediction is correct, does mean the model that was used for prediction is correct? Or is there a universal verification method as Karl Popper calls it, “scientific method”, that can apply to the verification of all models? If so, what is it? Why is it universal? And how is it different from other verification methods, for example, the verification of the fortune?
During the early 20th century, Social Darwinism was a popular theory. It states that the rich is rich because they are evolutionarily superior, therefore they will remain rich. The inequality between the rich and poor is the inevitable outcome of evolution therefore any attempts of government regulation is useless. For decades, the theory was tremendously successful in predicting the outcome, the rich were getting richer and the poor were getting poorer. Given that is the outcome the rich hoped for, the theory was also good at achieving the desired outcome. Only when government stepped in and start to regulating large corporations, the falsity of the theory was revealed.
Certain theories do have the tendency to perpetual the status quo, and by extension, themselves, and only when we seek to falsify them, their falsity can be revealed. We live in an age that more and more scientific theories are proposed each year. We are using machine learning models to predict future crimes so that the convicts who are more likely to commit crime in the future will be sentence to a greater degree which are disproportionately biased towards some races. When we applying those models to draw conclusions, let us not to forget, they are just models, a shadow of our reality, not the reality itself.
One thought on “On Causality”