Whole-genome sequencing and de novo assembly of a 2019 novel coronavirus (SARS-CoV-2) strain isolated in Vietnam


  • Le Tung Lam Institute of Biotechnology, Vietnam Academy of Science and Technology
  • Nguyen Trung Hieu National Influenza Center, Pasteur Institute of Ho Chi Minh City
  • Nguyen Hong Trang Institute of Biotechnology, Vietnam Academy of Science and Technology
  • Ho Thi Thuong Institute of Biotechnology, Vietnam Academy of Science and Technology
  • Tran Huyen Linh Institute of Biotechnology, Vietnam Academy of Science and Technology
  • Luu Thuy Tien Institute of Biotechnology, Vietnam Academy of Science and Technology
  • Nguyen Thi Ngoc Thao National Influenza Center, Pasteur Institute of Ho Chi Minh City
  • Huynh Thi Kim Loan National Influenza Center, Pasteur Institute of Ho Chi Minh City
  • Pham Duy Quang Pasteur Institute of Ho Chi Minh City
  • Luong Chan Quang Pasteur Institute of Ho Chi Minh City
  • Cao Minh Thang Pasteur Institute of Ho Chi Minh City
  • Nguyen Vu Thuong Pasteur Institute of Ho Chi Minh City
  • Hoang Ha Institute of Biotechnology, Vietnam Academy of Science and Technology
  • Chu Hoang Ha Institute of Biotechnology, Vietnam Academy of Science and Technology. Graduate University of Science and Technology
  • Phan Trong Lan Pasteur Institute of Ho Chi Minh City
  • Truong Nam Hai Institute of Biotechnology, Vietnam Academy of Science and Technology. Graduate University of Science and Technology




COVID-19, de novo sequencing, PacBio, SARS-CoV-2, SMRT sequencing, whole genome sequencing


The pandemic COVID-19 caused by the virus SARS-CoV-2 has devastated countries worldwide, infecting more than 4.5 million people and leading to more than 300,000 deaths as of May 16th, 2020. Whole-genome sequencing (WGS) is an effective tool to monitor emerging strains and provide information for intervention, thus help to inform outbreak control decisions. Here, we reported the first effort to sequence and de novo assemble the whole genome of SARS-CoV-2 using PacBio’s SMRT sequencing technology in Vietnam. We also presented the annotation results and a brief analysis of the variants found in our SARS-CoV-2 strain, which was isolated from a Vietnamese patient. The sequencing was successfully completed and de novo assembled in less than 30 hours, resulting in one contig with no gap and a length of 29,766 bp. All detected variants as compared to the NCBI reference were highly accurate, as confirmed by Sanger sequencing. The results have shown the potential of long read sequencing to provide high quality WGS data to support public health responses and advance understanding of this and future pandemics.


Download data is not yet available.


Ashby M. Opportunities for using PacBio Long-read Sequencing for COVID-19 Research. PacBio Webinar. April 23rd, 2020.

Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2015). GenBank. Nucleic Acids Res;43:D30–D35

Brufsky A (2020). Distinct Viral Clades of SARS-CoV-2: Implications for Modeling of Viral Spread. J Med Virol. 2020 Apr 20. doi: 10.1002/jmv.25902. [Epub ahead of print]

Centers for Disease Control and Prevention. CDC launches national viral genomics consortium to better map SARS-CoV-2 transmission. Accessed May 16th, 2020. https://www.cdc.gov/media/releases/2020/p0501-SARS-CoV-2-transmission-map.html

Ceraolo C, Giorgi FM (2020). Genomic variance of the 2019-nCoV coronavirus. J Med Virol. May;92(5):522-528.

Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, et al. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan (2020). Emerg Microbes Infect.Jan 28;9(1):221-236.

Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, et al (2020). Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. Feb 15;395(10223):507-513.

Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (2020). The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol. Apr;5(4):536-544.

Gonzalez-Reiche Ana S., Hernandez Matthew M., Sullivan Mitchell, Ciferri Brianne, Alshammary Hala et al. (2020). Introductions and early spread of SARS-CoV-2 in the New York City area. https://doi.org/10.1101/2020.04.08.20056929 [medRxiv preprint]

Goodwin S, McPherson JD, McCombie WR (2016). Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. May 17;17(6):333-51.

Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics. Apr 15;29(8):1072-5.

Gwinn M, MacCannell D, Armstrong GL. Next-Generation Sequencing of Infectious

Pathogens. JAMA. 2019 Mar 5;321(9):893-894.

Elbe S, Buckland-Merrett G (2017). Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33-46.

Johns Hopkins University. Coronavirus COVID-19 Global Cases. Accessed May 16th, 2020. https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, Foley B, Giorgi EE, Bhattacharya T, Parker MD, Partridge DG, Evans CM, de Silva TI, on behalf of the Sheffield COVID-19 Genomics Group, LaBranche CC, Montefiori DC (2020). Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. doi: https://doi.org/10.1101/2020.04.29.069054 [bioRxiv preprint]

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM (2017). Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. May;27(5):722-736.

Ladner JT, Beitzel B, Chain PS, Davenport MG, Donaldson EF, Frieman M, Kugelman JR, Kuhn JH, O'Rear J, Sabeti PC, Wentworth DE, Wiley MR, Yu GY; Threat Characterization Consortium, Sozhamannan S, Bradburne C, Palacios G (2014). Standards for sequencing viral genomes in the era of high-throughput sequencing. mBio. Jun 17;5(3):e01360-14.

Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. (2020). Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med. Mar 26;382(13):1199-1207.

Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. (2020). Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. Feb 22;395(10224):565-574.

Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019). The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. Jul 2;47(W1):W636-W641.

Marston DA, McElhinney LM, Ellis RJ, Horton DL, Wise EL, Leech SL, David D, de Lamballerie X, Fooks AR (2013). Next generation sequencing of viral RNA genomes. BMC Genomics. Jul 4;14:444.

Oude Munnink BB, Münger E, Nieuwenhuijse DF, Kohl R, van der Linden A, Schapendonk CME, van der Jeugd H, Kik M, Rijks JM, Reusken CBEM, Koopmans M (2020). Genomic monitoring to understand the emergence and spread of Usutu virus in the Netherlands, 2016-2018. Sci Rep. Feb 18;10(1):2798

Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, Bore JA, et al. (2016). Real-time, portable genome sequencing for Ebola surveillance. Nature. Feb 11;530(7589):228-232.

Rhoads A, Au KF (2015) PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics. 13(5):278-289.

Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J, Schloss JA, Waterston RH (2017). DNA sequencing at 40: past, present and future. Nature. Oct 19;550(7676):345-353.

Shean RC, Makhsous N, Stoddard GD, Lin MJ, Greninger AL (2019). VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinformatics. Jan 23;20(1):48.

Vietnam Ministry of Health. Coronavirus Disease 2019 (COVID-19) Information Page. Accessed May 18th, 2020. https://ncov.moh.gov.vn/

Wellcome Sanger Insitute. UK launches whole genome sequence alliance to map spread of coronavirus. Accessed May 16th, 2020. https://www.sanger.ac.uk/news/view/uk-launches-whole-genome-sequence-alliance-map-spread-coronavirus

Wu A, Peng Y, Huang B, Ding X, Wang X, Niu P, Meng J, Zhu Z, Zhang Z, Wang J, Sheng J, Quan L, Xia Z, Tan W, Cheng G, Jiang T (2020). Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China. Cell Host Microbe. Mar 11;27(3):325-328.

Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020). A new coronavirus associated with human respiratory disease in China. Nature. Mar;579(7798):265-269.

Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020 Mar;579(7798):270-273.