Taxonomic assignment for large-scale metagenomic data on high-perfomance systems

Vinh Van Le, Hoai Van Tran, Hieu Ngoc Duong, Giang Xuan Bui, Lang Van Tran
Author affiliations

Authors

  • Vinh Van Le Faculty of Information Technology, HCMC University of Technology and Education
  • Hoai Van Tran Faculty of Computer Science and Engineering, Bach Khoa University
  • Hieu Ngoc Duong Faculty of Computer Science and Engineering, Bach Khoa University
  • Giang Xuan Bui Faculty of Computer Science and Engineering, Bach Khoa University
  • Lang Van Tran Vietnam Academy of Science and Technology

DOI:

https://doi.org/10.15625/1813-9663/33/2/10753

Keywords:

DNA sequences, homology search, metagenomics, parallel algorithm, taxonomic assignment

Abstract

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.

Metrics

Metrics Loading ...

References

S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of molecular biology, vol. 215, no. 3, pp. 403–410, 1990.

A. E. Darling, L. Carey, and W. C. Feng, “The design, implementation, and evaluation of mpiblast,” Los Alamos National Laboratory, Tech. Rep., 2003.

N. N. Diaz, L. Krause, A. Goesmann, K. Niehaus, and T. W. Nattkemper, “Tacoa–taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach,” BMC bioinformatics, vol. 10, no. 1, p. 56, 2009.

W. Gerlach and J. Stoye, “Taxonomic classification of metagenomic shotgun sequences with carma3,” Nucleic acids research, vol. 39, no. 14, pp. e91–e91, 2011.

J. Handelsman, The new science of metagenomics: Revealing the secrets of out microbial planet. The National Academies Press, 2007.

W. Huang, L. Li, J. R. Myers, and G. T. Marth, “Art: a next-generation sequencing read simulator,” Bioinformatics, vol. 28, no. 4, pp. 593–594, 2011.

D. H. Huson, S. Mitra, H. J. Ruscheweyh, N. Weber, and S. C. Schuster, “Integrative analysis of environmental sequences using megan4,” Genome research, vol. 21, no. 9, pp. 1552–1560, 2011.

D. Langenk¨amper, A. Goesmann, and T. W. Nattkemper, “Ake-the accelerated k-mer exploration web-tool for rapid taxonomic classification and visualization,” BMC bioinformatics, vol. 15.

S. S. Mande, M. H. Mohammed, and T. S. Ghosh, “Classification of metagenomic sequences: methods and challenges,” Briefings in bioinformatics, vol. 13, no. 6, pp. 669–681, 2012.

M. H. Mohammed, T. S. Ghosh, N. K. Singh, and S. S. Mande, “Sphinx - an algorithm for taxonomic binning of metagenomic sequences,” Bioinformatics, vol. 27, no. 1, pp. 22 – 30, January 2011.

R. Ounit, S. Wanamaker, T. J. Close, and S. Lonardi, “Clark: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers,” BMC genomics, vol. 16, no. 1, p. 236, 2015.

Z. Rasheed and H. Rangwala, “A map-reduce framework for clustering metagenomes,” in Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. IEEE, 2013, pp. 549–558.

J. Shendure and H. Ji, “Next-generation dna sequencing,” Nature biotechnology, vol. 26, no. 10, pp. 1135–1145, 2008.

X. Su, J. Xu, and K. Ning, “Parallel-meta: efficient metagenomic data analysis based on highperformance computation,” BMC Systems Biology, vol. 6, no. 1, p. S16, 2012.

H. Teeling and F. O. Gl¨ockner, “Current opportunities and challenges in microbial metagenome analysisa bioinformatic perspective,” Briefings in bioinformatics, vol. 13, no. 6, pp. 728–742, 2012.

G. W. Tyson, J. Chapman, P. Hugenholtz, E. E. Allen, R. J. Ram, P. M. Richardson, V. V. Solovyev, E. M. Rubin, D. S. Rokhsar, and J. F. Banfield, “Community structure and metabolism through reconstruction of microbial genomes from the environment,” Nature, vol. 428, no. 6978, pp. 37–43, 2004.

V. Van Le, L. Van Tran, and H. Van Tran, “A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads,” BMC bioinformatics, vol. 17, no. 22, 2016.

Y. Wang, H. C. M. Leung, S. M. Yiu, and F. Y. L. Chin, “Metacluster-ta: taxonomic annotation for metagenomic databased on assembly-assisted binning,” BMC Genomics, vol. 15, 2014.

X. Yang, J. Zola, and S. Aluru, “Large-scale metagenomic sequence clustering on map-reduce clusters,” Journal of bioinformatics and computational biology, vol. 11, no. 01, p. 1340001, 2013.

Downloads

Published

29-12-2017

How to Cite

[1]
V. V. Le, H. V. Tran, H. N. Duong, G. X. Bui, and L. V. Tran, “Taxonomic assignment for large-scale metagenomic data on high-perfomance systems”, JCC, vol. 33, no. 2, p. 119–130, Dec. 2017.

Issue

Section

Computer Science