=0.00075 and one with a mean of 0.00024 and s.d. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). Note that six of these sequences fall under the terms of use of the GISAID platform. Coronavirus Disease 2019 (COVID-19) Situation Report 51 (World Health Organization, 2020). Below, we report divergence time estimates based on the HCoV-OC43-centred rate prior for NRR1, NRR2 and NRA3 and summarize corresponding estimates for the MERS-CoV-centred rate priors in Extended Data Fig. Mol. Specifically, progenitors of the RaTG13/SARS-CoV-2 lineage appear to have recombined with the Hong Kong clade (with inferred breakpoints at 11.9 and 20.8kb) to form the CoVZXC21/CoVZC45-lineage. Article M.F.B. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. eLife 7, e31257 (2018). Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). 2, vew007 (2016). In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. 725422-ReservoirDOCS). PI signals were identified (with bootstrap support >80%) for seven of these eight breakpoints: positions 1,684, 3,046, 9,237, 11,885, 21,753, 22,773 and 24,628. Boxplots show interquartile ranges, white lines are medians and box whiskers show the full range of posterior distribution. Given that these pangolin viruses are ancestral to the progenitor of the RaTG13/SARS-CoV-2 lineage, it is more likely that they are also acquiring viruses from bats. 32, 268274 (2014). Evolutionary rate estimation can be profoundly affected by the presence of recombination50. Note that breakpoints can be shared between sequences if they are descendants of the same recombination events. Extended Data Fig. The latter was reconstructed using IQTREE66 v.2.0 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation. 13, e1006698 (2017). The red and blue boxplots represent the divergence time estimates for SARS-CoV-2 (red) and the 2002-2003 SARS-CoV (blue) from their most closely related bat virus, with the light- and dark-colored versions based on the HCoV-OC43 and MERS-CoV centered priors, respectively. 1a-c ), has the third-highest number of confirmed COVID-19 cases in the state of So. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Sarbecovirus, HCoV-OC43 and SARS-CoV data were assembled from GenBank to be as complete as possible, with sampling year as an inclusion criterion. When viewing the last 7kb of the genome, a clade of viruses from northern China appears to cluster with sequences from southern Chinese provinces but, when inspecting trees from different parts of ORF1ab, the N. China clade is phylogenetically separated from the S. China clade. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Since experts have suggested that pangolins may be the reservoir species for COVID-19, the scaly anteater has been catapulted into headlines, news reports, and conversationsand some are calling COVID-19 "the revenge of the . Divergence time estimates based on the HCoV-OC43-centred rate prior for the separate BFRs (Supplementary Table 3) show consistency in TMRCA estimates across the genome. Dudas, G., Carvalho, L. M., Rambaut, A. Viruses 11, 174 (2019). A new coronavirus associated with human respiratory disease in China. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). 23, 18911901 (2006). Are you sure you want to create this branch? 6, 8391 (2015). 1, vev003 (2015). Evol. Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. Duchene, S. et al. J. Infect. These residues are also in the Pangolin Guangdong 2019 sequence. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. After removal of A1 and A4, we named the new region A. 3). One study suggests that over a century ago, one lineage of coronavirus circulating in bats gave rise to SARS-CoV-2, RaTG13 and a Pangolin coronavirus known as Pangolin-2019, Live Science . Rev. 190, 20882095 (2004). 35, 247251 (2018). Lancet 383, 541548 (2013). As a proxy, it would be possible to model the long-term purifying selection dynamics as a major source of time-dependent rates43,44,52, but this is beyond the scope of the current study. BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Softw. 874850). The lineage B.1 has been the major basal and widespread lineage from the initial SARS-CoV-2 spread and it became the more prevalent lineage in Colombia ( 13 ), while the B.1.111 lineage, first detected in the USA from a sample collected on March 7, 2020 and subsequently in Colombia on March 13, 2020 is currently circulating and mainly represented The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). Hon, C. et al. 04:20. The web application was developed by the Centre for Genomic Pathogen Surveillance. Two other bat viruses (CoVZXC21 and CoVZC45) from Zhejiang Province fall on this lineage as recombinants of the RaTG13/SARS-CoV-2 lineage and the clade of Hong Kong bat viruses sampled between 2005 and 2007 (Fig. Because the estimated rates and divergence dates were highly similar in the three datasets analysed, we conclude that our estimates are robust to the method of identifying a genomes NRRs. The sizes of the black internal node circles are proportional to the posterior node support. 5). The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (1730-1958) to 1877 (1746-1986), indicating that these pangolin . He, B. et al. Its origin and direct ancestral viruses have not been . Sci. and P.L.) the development of viral diversity. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Slider with three articles shown per slide. Share . Scientists trying to trace the ancestry of SARS-CoV-2, the virus responsible for COVID-19, have found the pangolin is unlikely to be the source of the virus responsible for the current pandemic. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Virus Evol. T.L. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic, https://doi.org/10.1038/s41564-020-0771-4. BFRs were concatenated if no phylogenetic incongruence signal could be identified between them. Cell 181, 223227 (2020). Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. Scientists defined the pangolin lineage of this variant to be B.1.1.523 and it was originally recognized as a variant under monitoring on July 14, 2021. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. 26, 450452 (2020). Boxes show 95% HPD credible intervals. BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. Centre for Genomic Pathogen Surveillance. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. The unsampled diversity descended from the SARS-CoV-2/RaTG13 common ancestor forms a clade of bat sarbecoviruses with generalist propertieswith respect to their ability to infect a range of mammalian cellsthat facilitated its jump to humans and may do so again. Sorting these breakpoint-free regions (BFRs) by length results in two segments >5kb: an ORF1a subregion spanning nucleotides (nt) 3,6259,150 and the first half of ORF1b spanning nt13,29119,628 (sequence numbering given in Source Data, https://github.com/plemey/SARSCoV2origins). By 2009, however, rapid genomic analysis had become a routine component of outbreak response. Lu, R. et al. 1 Phylogenetic relationships in the C-terminal domain (CTD). Mol. SARS-CoV-2 is an appropriate name for the new coronavirus. Lond. PubMed Central Wong, A. C. P., Li, X., Lau, S. K. P. & Woo, P. C. Y. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. To evaluate the performance procedure, we confirmed that the recombination masking resulted in (1) a markedly different outcome of the PHI test64, (2) removal of well-supported (bootstrap value >95%) incompatible splits in Neighbor-Net65 and (3) a near-complete reduction of mosaic signal as identified by 3SEQ. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. 36, 17931803 (2019). Google Scholar. 2). 3). In light of these time-dependent evolutionary rate dynamics, a slower rate is appropriate for calibration of the sarbecovirus evolutionary history. 5. The difficulty in inferring reliable evolutionary histories for coronaviruses is that their high recombination rate48,49 violates the assumption of standard phylogenetic approaches because different parts of the genome have different histories. Identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in China. Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. 27) receptors and its RBD being genetically closer to a pangolin virus than to RaTG13 (refs. and JavaScript. All authors contributed to analyses and interpretations. & Boni, M. F. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Extensive diversity of coronaviruses in bats from China. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. Li, X. et al. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. performed codon usage analysis. master 4 branches 94 tags Code AngieHinrichs Add entries for pangolin-data/-assignment 1.18.1.1 ( #512) ad16752 4 days ago 990 commits .github/ workflows Update pangolin.yml 7 months ago docs docs need guide tree now 3 years ago pangolin SARS-like WIV1-CoV poised for human emergence. Except for specifying that sequences are linear, all settings were kept to their defaults. Evol. Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. Lam, T. T. et al. 3 Priors and posteriors for evolutionary rate of SARS-CoV-2. Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. It is RaTG13 that is more divergent in the variable-loop region (Extended Data Fig. # File containing the ID of the samples, the Sequence of the haplotype, the Continent, the country, the Region, the Data, the Lineage of Pangolin and Nextstrain clade, and the haplotype number # In this order # Could be obtained from the database We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). The authors declare no competing interests. 36, 7597 (2002). Trova, S. et al. The presence in pangolins of an RBD very similar to that of SARS-CoV-2 means that we can infer this was also probably in the virus that jumped to humans. In Extended Data Fig. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Temporal signal was tested using a recently developed marginal likelihood estimation procedure41 (Supplementary Table 1). Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. Emerg. To gauge the length of time this lineage has circulated in bats, we estimate the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 and RaTG13. Trends Microbiol. 2a. Robertson, D. nCoVs relationship to bat coronaviruses & recombination signals (no snakes) no evidence the 2019-nCoV lineage is recombinant. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. However, the coronavirus isolated from pangolin is similar at 99% in a specific region of the S protein, which corresponds to the 74 amino acids involved in the ACE (Angiotensin Converting Enzyme . Because the SARS-CoV-2 S protein has been implicated in past recombination events or possibly convergent evolution12, we specifically investigated several subregions of the Sproteinthe N-terminal domain of S1, the C-terminal domain of S1, the variable-loop region of the C-terminal domain, and S2. This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV . Sibling lineages to RaTG13/SARS-CoV-2 include a pangolin sequence sampled in Guangdong Province in March 2019 and a clade of pangolin sequences from Guangxi Province sampled in 2017. D.L.R. Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. 5, 536544 (2020). Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. Yuan, J. et al. We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. Uncertainty measures are shown in Extended Data Fig. This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. It is available as a command line tool and a web application. We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. Unfortunately, a response that would achieve containment was not possible. COVID-19 lineage names can be confusing to navigate; there are many aliases and if you want to catch them all to examine further in data analyses it helps to Allen O'Brien on LinkedIn: #r #rstudio #rstats #pangolin #covid19 #datascience #epidemiology The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). PureBasic 53 13 constellations Public Python 42 17 This boundary appears to be rarely crossed. & Bedford, T. MERS-CoV spillover at the camelhuman interface. Anderson, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. B.W.P. When the first genome sequence of SARS-CoV-2, Wuhan-Hu-1, was released on 10January 2020 (GMT) on Virological.org by a consortium led by Zhang6, it enabled immediate analyses of its ancestry. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. Chernomor, O. et al. Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. 1. We thank T. Bedford for providing M.F.B. Decimal years are shown on the x axis for the 1.2 years of SARS sampling in c. d, Mean evolutionary rate estimates plotted against sampling time range for the same three datasets (represented by the same colour as the data points in their respective RtT divergence plots), as well as for the comparable NRA3 using the two different priors for the rate in the Bayesian inference (red points). Lin, X. et al. 26 March 2020. ISSN 2058-5276 (online). Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Zhou, P. et al. Evol. Our approach resulted in similar posterior rates using two different prior means, implying that the sarbecovirus data do inform the rate estimate even though a root-to-tip temporal signal was not apparent. the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Mol. Liu, P. et al. Adv. In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019), with the light and dark coloured version based on the HCoV-OC43 and MERS-CoV centred priors, respectively. The Artic Network receives funding from the Wellcome Trust through project no. Bayesian evaluation of temporal signal in measurably evolving populations. P.L. We thank A. Chan and A. Irving for helpful comments on the manuscript. While it is possible that pangolins, or another hitherto undiscovered species, may have acted as an intermediate host facilitating transmission to humans, current evidence is consistent with the virus having evolved in bats resulting in bat sarbecoviruses that can replicate in the upper respiratory tract of both humans and pangolins25,32. 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the Spike protein. M.F.B., P.L. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . Press, 2009). 4). EPI_ISL_410721) and Beijing Institute of Microbiology and Epidemiology (W.-C. Cao, T.T.-Y.L., N. Jia, Y.-W. Zhang, J.-F. Jiang and B.-G. Jiang, nos. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. Holmes, E. C., Rambaut, A. Posterior distributions were approximated through Markov chain Monte Carlo sampling, which were run sufficiently long to ensure effective sampling sizes >100. Bryant, D. & Moulton, V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. 25, 3548 (2017). A pneumonia outbreak associated with a new coronavirus of probable bat origin. For coronaviruses, however, recombination means that small genomic subregions can have independent origins, identifiable if sufficient sampling has been done in the animal reservoirs that support the endemic circulation, co-infection and recombination that appear to be common. Alexandre Hassanin, Vuong Tan Tu, Gabor Csorba, Nicola F. Mller, Kathryn E. Kistler & Trevor Bedford, Jack M. Crook, Ivana Murphy, Diana Bell, Simon Pollett, Matthew A. Conte, Irina Maljkovic Berry, Yatish Turakhia, Bryan Thornlow, Russell Corbett-Detig, Nature Microbiology