COVID-19

As of today 11th February 2020, a total of 40,554 cases of COVID-19 have been confirmed including 909 deaths. COVID-19 fear has caused global traffic ban into, and city lockdown within China.

COVID-19 certainly is a very serious epidemic, as my family lives in China, I found my self constantly looking for news and updates on this subject. But it is so frustrating that most of the news I found only reports what the conclusion of the experts or authorities was, rarely how was the conclusion was drawn. So, I have gathered some information I can find online from research papers to news agencies in the hope of filling in some of the gaps.

What is COVID-19?

COVID-19 is the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2 for short). According to the work “A pneumonia outbreak associated with a new coronavirus of probable bat origin” by Zhuo et. al, SARS-CoV-2 has genetic similarities to SARS-CoV (79.5%) and bat coronaviruses (96%). I don’t currently have access to the paper, so I can’t tell what’s the metric of similarity was used for the measurement. But we can’t draw any conclusions that if SARS-CoV-2 comes from bat coronaviruses or not. As humans are 99% genetically similar to monkeys, and we know now that humans are most unlikely to be evolved from monkeys. (We really should stop using the March of Progress in our biology textbooks.) Also, the genetic sequence change of single-stranded RNA is very different from double-linked DNAs.

Based on the work “Severe Acute Respiratory Syndrome Coronavirus Sequence Characteristics and Evolutionary Rate Estimate from Maximum Likelihood Analysis” by Salemi et. al, a study of the SARS-CoV virus that was responsible for the SARS outbreak in 2003, they have identified 21,333 nucleotides, 63 sites with at least one sequence with a different nucleotide, and only 10 sites with phylogenetically informative on SARS-CoV. Based on the samples they have, they have estimated 4 × 10−4 nucleotide changes per site per year. It would suggest that SARS-CoV RNA is fairly stable. If we assume that SARS-CoV-2 shares similar RNA stability. Then we can use the RNA sequence to identify COVID-19. But how much accuracy would it require to be? Just like other organisms, SARS-CoV-2’s RNAs are not identical from one virus to another. For a human, that number has been established to be 99.9%.

I have found a couple of research papers on RNA sequencing, for example, “A complete protocol for whole-genome sequencing of virus from clinical samples: Application to coronavirus OC43” by Maurier et. al, and “Rapid Sequencing of Multiple RNA Viruses in Their Native Form” by Wongsurawat et. al. They claim that sequencing accuracies are between 99% and 94%, and 97%, respectively. (Which puts the similarity we draw between bat coronavirus and SARS-CoV-2 uncomfortably in peril).

So the only way to be certain that a patient has contracted COVID-19 is to isolate the virus from the infected tissue and sequence the RNA and comparing it with the sample sequence that we categorized as the COVID-19 virus. A common RNA test is called the nucleic acid test (NAT). Polymerase chain reaction (PCR) is one type of NAT test. Based on the work “Review: Diagnostic accuracy of PCR-based detection tests for Helicobacter Pylori in stool samples.” by Khadangi et. al, the Helicobacter Pylori PCR test had a performance of 71% sensitivity and 96% specificity. This means that 328 samples they have studied, 26 of which have been diagnosed with Helicobacter Pylori. The PCR test detected 71% of 26 diagnosed patients as positive, and 96% of the rest as negative. But I was unable to find any research in the accuracy for RNA tests for coronaviruses (In the example above, Helicobacter Pylori is a bacterial infection, not a viral infection.).

Leave the question about what ground truth and the accuracy aside, RNA tests are expensive and take a long time to finish. A couple of weeks back, COVID-19 was diagnosed clinically using the lab tests on the patient’s nasal or throat mucus samples. But the recent discovery of the test accuracy issue and the massive amount of patients in need of the test, the clinical diagnosis has changed from lab tests to symptomatic tests. That is, when a patient has a subset of a collection of symptoms, they will be diagnosed with COVID-19. The number of confirmed cases in China has almost doubled last week. The collection of symptoms are: fever, cough, shortness of breath, low blood oxygen level, shadows in CT lung scans, etc. But those are also the symptoms of other viral pneumonia cases. I do not wish to conflict correlation with causation, but I am not sure that if the increased confirmed cases are caused by the change of diagnostic standards. It also leads to an epistemological problem: if we can not have high confidence in which patients are contracted COVID-19, using the samples of those patients as ground truth for serology or NAT test would further increase that inaccuracy. (From an epidemic point of view, symptomatic diagnostics may contain a large number of false positives, but to prevent the spread of the pandemic, sometimes it is better safe than sorry.)

I am certain that COVID-19 is a new strain of the coronaviruses that can trigger acute respiratory distress syndrome, in severity, can be fatal. But I am also found myself struggling with my confidence in the statues reported on the news outlets encountered on the internet. I found it is so difficult to differentiate the conjunctions about this outbreak from tested theories. As those tests based on statistical analysis to only establish correlation from the sampled data, it is easy for us to form up theories in attempts to explain the data. In the information age, it is so easy for us to share news, thought, and our interpretation of the situation. But every incorrect guess we put out there based on our incomplete data, increases the misunderstanding of the general public, and may even mis-influence the policies that were put in place.

It is so crucial for us to be able to differentiate data from theories formed around the data, and theories formed around the theories that formed around the data. Clinically diagnosed patients may or may not actually be contracted with COVID-19. The actual number of patients who are contracted with COVID-19 is difficult to estimate, without knowing the sensitivity and specificity of the diagnostic tests. The increase in the number of diagnosed cases in the past couple of weeks may not be due to the spread of the diseases, but the changes in the diagnostic tests.

I hope that everyone stays healthy and safe during this outbreak. But also, when we saw another update about the outbreak, let all of us doing a little research before sharing it with others. The correct information is important for making correct decisions. As we share the world and the crisis together, let us also accept our epistemic responsibility.


P.S. It is easy to make conjunctions. It is hard to present evidence that supports the conjunctions made. It is extremely hard, if even possible, to show that the evidence presented can unambiguously prove the conjunction. Maybe it is a fevered dream to ask for sufficient evidence in a world with so much ambiguity, uncertainty, and unknowns.

Leave a comment