Thu 2022-Jan-06

Reeeeeeally long COVID!

Tagged: COVID / JournalClub / Statistics / PharmaAndBiotech

How long can COVID-19 go on? If you answered less than $O(10^4)$ years until everyone with susceptible genes is dead… well, think again.

How long can a pandemic go on, really?

We are now starting our 3rd year of a global pandemic. Everybody’s tired of it. Everybody wants it to end so we can move on. How much longer, really, can that possibly take?

Curr Biol: Human genetics and 20,000 year old coronavirus epidemic in Asia It’s just come to my attention that last summer there was a pretty disturbing paper in Current Biology by Souilmi, et al. on an ancient pandemic, also very likely a coronavirus. [1] We’ll go through it in some detail below, paying attention to their methods, but the conclusion is stark: around 20,000 to 25,000 years ago in East Asia, there was a coronavirus epidemic that lasted long enough to leave a genetic imprint on the human population. It lasted around 20,000 years, stopping only when all those with susceptible genes were dead.

Maybe we should try to avoid that?

Zimmer @ NYT: Corona virus epidemic 20,000 years ago Machemer @ Smithsonian: 20,000 year old coronavirus epidemic marked human genome Australian Broadcast Company: Coronavirus epidemic in East Asia 25,000 years ago Live Science: Coronavirus 25,000 years ago If you don’t want to read a full-up scientific paper, or for that matter my summary of it, you can look through popular media summaries. You probably know that here at Chez Weekend we take a dour view of the popular media’s attempt report science; it’s usually mangled beyond all recognition.

However, we’ve found 4 articles which, after reading the actual paper, seem not to have mangled anything too badly (though they all do leave out a lot!).

  • Carl Zimmer at the New York Times is a pretty good science journalist, and it shows in the work he did reporting on this last June in his column. [2]
  • Theresa Machemer at the Smithsonian Magazine did a similarly credible job [3] around the same time. She also extracts the tidbit that the epidemic of coronavirus infection in East Asia went on for 20,000 years, which should get you to notice that just “muddling through” is the incorrect response here.
  • The Australian Broadcasting Corporation (ABC)’s science reporters Conroy and Salleh also summarized things relatively clearly. [4]
  • Yasemin Saplakoglu at LiveScience wrote an even briefer, though relatively accurate summary all the way back in April [5], so she gets pride of place for doing the early reporting when the paper was still a preprint.

Some history

WorldOMeter: COVID-19 daily cases & deaths as of 2021-Jan-06 There are 7 coronaviruses that regularly infect humans.

  • Of those, 4 are relatively recent and cause something like a cold: HCoV-229E, HCoV-OC43, HCoV-NL63, and HCoV-HKU1. [6]
    • Together, they account for about 15% - 30% of common colds and upper resipiratory infections in adults, while rarely causing life-threatening lower respiratory infections in infants, the elderly, and the immunocompromised.
    • Based on known rates of mutation, they are relatively new in evolutionary terms. The newest, HCoV-HKU1, emerged in about the 1950s. The oldest, HCoV-NL63, looks like it emerged about 1200CE, or 820 years ago.
  • The remaining 3 are all very recent indeed, and all big troublemakers, each a result of zoönotic transfer (from bats, civets, and camels, apparently):
    • SARS-CoV1 emerged in China in 2002, infecting 8000 and kiling 800 before subsiding (a 10% lethality rate).
    • MERS-CoV emerged in 2006 in the Middle East, infecting 2400 and killing over 850 before subsiding (a 35% lethality rate!).
    • Finally, SARS-CoV2 emerged in 2019 in China (again!), triggering the pandemic in which we find ourselves: as of 2021-Jan-06 there have been 298,549,912 cases and 5,484,607 deaths worldwide (a very fortunate 1.8% lethality rate). [7]

Worse, there is now profound evidence that SARS-CoV2 has infected many wild, farmed, and domesticated animals. That means it can come back by another zoönotic transfer at any time. [8]

This naturally and rather pointedly raises the question: how much longer can this go on?

A look back in deep time

Human/coronavirus interactome, excess adaptations, time scale Zeberg & Pääbo in Nature: Neaderthals and COVID-19 genetic risk factors Zimmer @ NYT: Neanderthal DNA is both good and bad In the paper we’re doing today in Weekend Journal Club, Souilmi and co-authors attempt to answer this question by loooking at deep time: if this has happened in the past, is there evidence left in the human genome of viruses forcing us to evolve in response?

It’s a bit of a complicated story. Fortunately, the authors supply us with a “graphical abstract” shown here (click to embiggen).

  • They use previous research to identify 420 human genes that interact with coronaviruses, from mass spectrometry data and from literature curation. 332 of those interact with SARS-CoV2 specifically. Amusingly for speakers of American English, they call these VIPs (viral-interacting proteins). These proteins are very specific to the coronavirus family.
    • Proteins like these will evolve quickly under deadly selection pressure: survivors will either have mutations in the proteins that permit infection, or will have the forms of proteins that tend to prevent infection; those who do not will be dead.
    • There’s plenty of evidence that this has happened, e.g., with some helpful and other harmful genes inherited from Neanderthals 50,000 years ago. [9] [10] (NB: Svante Pääbo, in addition to the distinction of having 2 consecutive umlauts in his name, is the world’s expert on Neanderthal genetics, being the leader of the group that first sequenced Neanderthal DNA from multiple samples. I’ve seen him speak at conferences. He’s famous for slides with no bullet points: he just puts up pictures and talks about them in a very engaging way.)
  • They used the 1000 Genomes Project’s data to look across 2504 human samples from 26 different ethnic/geographic groups across the world, seeking populations that have fixed mutations (i.e., very high frequency in a population) in those VIP genes.
  • They looked for proteins that (a) showed adaptation in the same population at about the same time (using a mutational clock), (b) stopped adaptation around the same time in the same population, (c) were known to be related to coronavirus infection, and (d) were near regulatory regions in the genome associated with expression in the lungs.

The amazing result is that they found evidence of a coronavirus pandemic in the deep past, in a single population (isolated because it was, after all, the Stone Age), whose modern day descendants carry with them their evolutionary adaptations to coronaviruses.

A summary of (some of) their results

Identifying human populations forcibly evolved by coronaviruses

First, let’s look at how they pored over human gene pools in various ethnic and geographic groups. They did a sweep across the VIP genes, looking for statistically significant enrichment of the exact same mutation in group samples. They estimated statistical significance by comparing to a block-randomized genome (adjusted for confounders) to get an idea of the False Positive Rate, and then got a final $p$-value by bootstrap. They also did a Gene Ontology enrichment to reject instances explained by phenomena other than viral interactions.

Soilmi _et al.:_ Fold enrichment of coronavirus VIPs in East Asian vs non-East Asian populations The result is shown here in Figure 1 (click to embiggen).

  • Each plot is an ethnic/geographic group. The top row are East Asian groups; for comparison the bottom row are non-East Asian groups. See how different they are?
  • The vertical axis is a fold enrichment for VIP mutations (presumably the fold is with respect to the block-randomized genomes?).
    • The curve is the enrichment itself, while the gray zone around it is the 95% confidence interval. It’s cut off from above at 20-fold enrichment. Basically the curve goes up if the group has enrichment of the same VIP mutations, and not if the group doesn’t.
    • The red dots, of which there are many, represent significance at $p \lt 0.001$, i.e., there’s only 1 chance in 1000 to see this at random.
    • The horizontal dashed line is what you’d expect to see if there were no effect.

So as you can see, the East Asian populations are enriched for VIP mutations by several measures, whereas the non-East Asian populations are not. Mostly these enriched populations are from China, Viet Nam, and Japan. (Is it a coincidence that the ancient epidemic was in East Asia, and both SARS-CoV1 and SARS-CoV2 emerged in China in modern times? I dunno either, but it makes me uneasy somehow. To avoid any suspicion of prejudice, let’s regard that as coincidence until proven by more data.)

Conclusion: Certain East Asian populations show, with very high confidence, well fixed mutations in genes for the VIP proteins. No such enrichment happens anywhere else, even in neighboring populations. No such enrichment is seen for genes relate to other viruses in East Asia. So coronaviruses have driven human evolution in East Asia.

Ok, so when?

The next question: when did this happen?

The first constraint is that the methods used here have limited sensitivity to genetic events more than 30,000 years ago. So that’s an upper limit to how far back we’re looking.

They used a variety of methods to hone in more precisely: Ancestral Recombination Graphs, localization of the mutations near regulatory eQTL (expression quantitative trait loci), and so on. We won’t drag through the details here, except to note that the significance thresholds were impressive (iSAFE proximity test $p \lt 10^{-9}$, each VIP gene with ARG $p \lt 10^{-3}$, and so on).

Souilmi _et al.:_ Adaptation of 42 VIPs clustered at 870 generations ago The result was that there were 42 VIP genes showing adaptation clustered around 870$\pm$200 generations ago. Their Figure 2 shown here (click to embiggen) shows the time of adapatation of the coronavirus VIP genes (pink) clusters somewhere 770 – 970 generations ago, and that this is much more than all other genes in the genome (blue) have done. This excess is statistically significant at $p \lt 2.3 \times 10^{-4}$.

Souilmi _et al.:_ 42 coronavirus VIP gene allele frequencies over time in Chinese Dai and Chinese Han You can see the same thing happening if you look at individual genes, and ask when individual allele frequencies started to rise, i.e., when they start to appear in a large fraction of the population. This is their Figure 3, reproduced here (click to embiggen).

  • The top row is Chinese Dai, the bottom row is Chinese Han.
  • The horizontal axis is time (generations ago). The vertical axis is the mutated allele frequency, e.g., 0% - 100% of the population.
  • The graphs on the right are zoomed in to show detail around the time the ancient epidemic started inducing the mutation.

You can clearly see that all 42 genes increasingly had the same mutations at the same time, namely 900ish generations ago. We can also see that the spread of those genes continued until about 200ish generations ago, i.e., it is likely that a coronavirus continued to exert selective pressure (i.e., kill everybody without the protective mutations) for 700ish generations or so.

For a variety of reasons, people use a generation time of 28 years per generation [11], so we’re looking at about 25,000 years ago. That becomes even more interesting when we note that coronaviruses themselves only evolved as a species at about the same time, namely an estimated 23,000 years ago! [12] Coronaviruses almost immediately jumped to humans upon emerging as a distinct viral species; this will happen again and again and again and…

Conclusion: We’ll let the authors say it themselves:

Consequently, our results are consistent with the emergence of a viral epidemic ∼900 generations, or ∼25,000 years (28 years per generation), ago that drove a burst of strong positive selection in East Asia. Selection events starting 900 generations ago clearly predate the estimated split of different East Asian populations included in the 1000 Genomes Project from their shared ancestral population.

… [W]e note that the signal is restricted specifically at CoV-VIPs and none of 17 other viruses that we tested exhibit the same temporal clustering.

Ok, so how long did that go on?!

Right. So we know what happened and when it happend, but how long did it go on killing people?

A rough answer for how long this went on is to look at the previous figure, and note that the allele frequency of the specific mutations in the VIP genes stabilized about 200 generations (about 5,000 years) ago. To get a more sophisticated estimate, the authors looked for coordinated changes in the 42 coronavirus VIP genes, since coordination presumably indicates selective pressure from the virus in common across the 42 VIP genes. The result was consistent with selection until about 5,000 years ago. So for 20,000 years, a coronavirus was selectively killing people in East Asia until all those who didn’t have the resistance mutations were dead.

Conclusion: The coronavirus pandemic lasted from 25,000 years ago to 5,000 years ago. Or, in other words, it lasted for 20,000 years.

Note well that figure: Just “riding it out” will take potentially forever. Fortunately, we have more resources than our ancient ancestors. They could only engage in some minor infection-avoiding habits and rely on their genes. We have scientifically validated interventions like masks, social distancing, infection-preventing vaccines, and post-infection therapies like paxlovid.

So, you know the drill by now: Mask. Social distance. Vaccinate. Get paxlovid if you get sick.

But wait, there’s more!

The study goes on to do a lot more stuff; indeed this is only about the first half of the paper.

  • They noted the 42 coronavirus VIP genes are enriched for features known to interact with viruses (and even coronivirses specifically) according to the Gene Ontology. This gives us some confidence in the correctness of the analysis.
  • Interestingly, some of the 42 genes are druggable, which is a starting point for future antiviral drug research.
    • Four of them (SMAD3, IMPDH2, PPIB, and GPX1) are the targets of 11 existing drugs currently being investigated for coronavirus therapy, so that’s a good thing to keep doing.
    • Five more of them are targeted by multiple drugs for other diseases, so they ought to be investigated for repurposing against COVID-19.
    • An additional six genes are part of the “druggable genome” [13], so we could perhaps find new therapeutic molecules there.

The Weekend Conclusion

Whew! Let’s recap what we’ve learned, and how that informs what we should do:

  1. We know 300-400ish genes which interact with the 7 coronaviruses known to infect humans.
  2. East Asia had a long coronavirus epidemic that fixed mutations in 42 of those genes (but not elsewhere; little long-distance travel in the late Stone Age) 25,000 years ago to 5,000 years ago.
  3. There was continuous selective pressure on the human genome in East Asia (“continuous selective pressure” = people dying) for 20,000 years.
  4. Of the 42 genes that were selectively mutated and fixed in the genome, 4 + 5 + 6 = 15 genes either are targeted by drugs currently in coronavirus trials, or are targeted by other existing drugs, or are in the druggable genome where the chemistry to make a new drug for them looks tractable. There are opportunities for novel coronavirus therapies here.
  5. Evolution is doing it the hard way: people die until only those with resistance mutations are left.
  6. Vaccination is doing it the easy way: people get a couple shots, take a day or so off, and go on about their lives. Don’t make the stupid choice here.

Again: Mask. Social distance. Vaccinate. Get paxlovid & fluvoxamine if you get sick. Support research on drugs for the 15 druggable genes above.

I know you’re tired of doing all that. But how tired would you and your descendants be after 20,000 years of this? Make the smart choice here.


Notes & References

1: Y Souilmi, et al., “An ancient viral epidemic involving host coronavirus interacting genes more than 20,000 years ago in East Asia”, Current Biology 31:16 (2021-Aug-23), pp. 3505-3514. DOI: 10.1016/j.cub.2021.05.067.

2: C Zimmer, “A Coronavirus Epidemic Hit 20,000 Years Ago, New Study Finds”, New York Times, 2021-Jun-24.

3: T Machemer, “Over 20,000 Years Ago, a Coronavirus Epidemic Left Marks in Human DNA”, Smithsonian Magazine, 2021-Jun-30.

4: G Conroy & A Salleh, “Coronavirus epidemic broke out in East Asia around 25,000 years ago, gene study shows”, ABC Science, 2021-Jun-24.

5: Y Saplakoglu, “An ancient coronavirus swept across East Asia 25,000 years ago”, LiveScience, 2021-Apr-23.

6: DX Liu, et al., “Human Coronavirus-229E, -OC43, -NL63, and -HKU1 (Coronaviridae)”, Encyc Virol (2021-Mar-01), 428-440. PMC: PMC7204879. DOI: 10.1016/B978-0-12-809633-8.21501-X.

7: WorldOMeter, “COVID_19 Coronavirus Pandemic”, retrieved the morning of 2021-Jan-06. It’s probably a lot more by the time you read this.

8: T Prince, et al., “SARS-CoV-2 Infections in Animals: Reservoirs for Reverse Zoonosis and Models for Study “, Viruses 13:3, p. 494, 2021-Mar-17. PMC: PMC8002747. DOI: 10.3390/v13030494.

9: H Zeberg & S Pääbo, “The major genetic risk factor for severe COVID-19 is inherited from Neanderthals”, Nature 587, pp. 610-612, 2020-Nov-26. DOI: 10.1038/s41586-020-2818-3. Yes, Svante really has 2 consecutive umlauts in his name, but is also interesting for other reasons.

10: C Zimmer, “Deep in Human DNA, a Gift From the Neanderthals”, New York Times, 2018-Oct-04.

11: P Moorjani, “A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years”, Proc Natl Acad USA 113:20, pp. 5652-5657, 2016-May-17. DOI: 10.1073/pnas.1514696113.

12: M Ghafari, et al., “Prisoner of War dynamics explains the time-dependent pattern of substitution rates in viruses “, BioRχiv preprint, 2021-Feb-09. DOI: 10.1101/2021.02.09.430479.

13: C Finan, et al., “The druggable genome and support for target identification and validation in drug development”, Sci Transl Med 9:383, 2017-Mar-29. DOI: 10.1126/scitranslmed.aag1166.

Published Thu 2022-Jan-06

Gestae Commentaria

That 1.8% lethality rate is really important, and one of the things that has driven my contention that Covid-19 is really just a near miss for the sort of pandemic that really would be bad. But the idea that it’ll be at this level for another 20,000 years is also something to think about. Is what we thought we had – a pandemic lasting for a year or so – actually the near miss, and what we actually have – a pandemic lasting for 20,000 years or so – the real thing?

We don’t know. Extrapolation from 2 years of experience to the next 20,000 years is not for the faint hearted.

Weekend Editor, Fri 2022-Jan-07 21:49

Hi, Louise — always good to hear from you.

That 1.8% lethality rate is really important…

Absolutely! It could have been much, much worse. Especially in 2020 when no vaccines were available: if we’d had a virus with the fierce propagation of Omicron and the lethality of MERS, we might have lost 1/3rd of humanity. That’s in the same territory as the Black Plague in Europe.

Extrapolation from 2 years of experience to the next 20,000 years is not for the faint hearted.

But also keep in mind that 20,000 year epidemic was in a Stone Age society with no science, no medicine, no public health. They had only 2 things to protect them: (a) their genes, and (b) their behaviors (superstitious avoidance of ritually unclean things, and a disgust instinct to avoid stuff that looks or smells bad). That’s not nothing, but it’s also not much.

And it’s not our situation.

We now have well-founded NPI’s like proper masks and social distancing. We have vaccines that work amazingly well. We also have really fascinating research in progress on pancoronavirus vaccines that might work across the whole family. A ferritin nanoparticle looks promising, described by Your Local Epidemiologist as like a soccer ball presenting multiple ligands on each face (apparently referring to this Nature article).

So I highlighted the 20,000 year thing as a riposte to the knuckleheads who say we “just have to let it burn through” and get herd immunity the hard way, apparently largely because scientists are a bunch of lefty elitists with no social skills and poor fashion choices. Or something.

All we have to do is make the smart choices here. Which is, surprisingly to me, proving difficult in an environment full of nascent fascists who deny reality itself.

Perhaps I’m gloomy today.


Thanks for the highlight and walk-through. It’s exciting and humbling to see collaborations that are so broad and deep (Mass spec, paleogenomics, virology, epidemiology, evolution) that no one person could synthesize and understand it all. Except John von Neumann, maybe.

Not entirely unrelated, in November the essayist Phillippe Lemoine published his speculation and model of the pandemic, Have we been thinking about the pandemic wrong? The effect of population structure on transmission. The title is adequate explanation of his thesis.

I have not seen other explanations for the simple and basic question, “Why are there cresting and receding waves of SARS-CoV-2 infections?” At the beginning, Everybody Knew there was going to be a sigmoidal curve, the only questions were How Fast and How Big. Tomas Pueyo, March 19, 2020, Coronavirus: The Hammer and the Dance. Influential, representative, and (with hindsight) wrong.

Hyperlink embeds didn’t work. Naked html links for the above comment:

Lemoine – https://cspicenter.org/blog/waronscience/have-we-been-thinking-about-the-pandemic-wrong-the-effect-of-population-structure-on-transmission/

Pueyo – https://tomaspueyo.medium.com/coronavirus-the-hammer-and-the-dance-be9337092b56

Weekend Editor, Thu 2022-Jan-06 22:47

I fixed the hyperlinks for you. (Sorry about that: comments here use Github-Flavored Markdown (GFM), rather than pure HTML. Though you can get away with some HTML, I’ve never bothered to figure out exactly which HTML elements work and which don’t.)

It’s exciting and humbling to see collaborations that are so broad and deep (Mass spec, paleogenomics, virology, epidemiology, evolution) that no one person could synthesize and understand it all. Except John von Neumann, maybe.

Yeah, no kidding.

A truly amazing set of only vaguely related technologies (mRNA synthesizers, lipid nanocapsules, huge computing and sequencing resources, drugs like paxlovid…) all came to fruition at the exact right moment. The religious part of me wants to search for divine influence, while the skeptical part of me just notes that of course we use all the tools that come to hand. One lesson is that new knowledge is never wasted: sooner or later, we’ll need it, so we better be able to look it up when the need is upon us.

Re von Neumann: Because of my history I was also bombarded with stories of his contemporaries, most relevantly in this case Norbert Wiener and Enrico Fermi.

  • Wiener would have invented a solution, but would probably forget to tell you about it unless you asked him every week.
  • Grad students eventually became wary of telling Fermi what they were working on, for fear he’d work it all out, solve the problem, and then they’d need a new thesis topic.
  • Von Neumann would have done the same thing, but he’d have invited you to a cocktail party in Princeton afterwards. :-)

And thanks for the suggested readings; they’re in my queue.


Comments for this post are closed pending repair of the comment system, but the Email/Twitter/Mastodon icons at page-top always work.