Taxonomic assignment for large-scale metagenomic data on high-perfomance systems
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/33/2/10753Keywords:
DNA sequences, homology search, metagenomics, parallel algorithm, taxonomic assignmentAbstract
Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.
Metrics
References
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of molecular biology, vol. 215, no. 3, pp. 403–410, 1990.
A. E. Darling, L. Carey, and W. C. Feng, “The design, implementation, and evaluation of mpiblast,” Los Alamos National Laboratory, Tech. Rep., 2003.
N. N. Diaz, L. Krause, A. Goesmann, K. Niehaus, and T. W. Nattkemper, “Tacoa–taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach,” BMC bioinformatics, vol. 10, no. 1, p. 56, 2009.
W. Gerlach and J. Stoye, “Taxonomic classification of metagenomic shotgun sequences with carma3,” Nucleic acids research, vol. 39, no. 14, pp. e91–e91, 2011.
J. Handelsman, The new science of metagenomics: Revealing the secrets of out microbial planet. The National Academies Press, 2007.
W. Huang, L. Li, J. R. Myers, and G. T. Marth, “Art: a next-generation sequencing read simulator,” Bioinformatics, vol. 28, no. 4, pp. 593–594, 2011.
D. H. Huson, S. Mitra, H. J. Ruscheweyh, N. Weber, and S. C. Schuster, “Integrative analysis of environmental sequences using megan4,” Genome research, vol. 21, no. 9, pp. 1552–1560, 2011.
D. Langenk¨amper, A. Goesmann, and T. W. Nattkemper, “Ake-the accelerated k-mer exploration web-tool for rapid taxonomic classification and visualization,” BMC bioinformatics, vol. 15.
S. S. Mande, M. H. Mohammed, and T. S. Ghosh, “Classification of metagenomic sequences: methods and challenges,” Briefings in bioinformatics, vol. 13, no. 6, pp. 669–681, 2012.
M. H. Mohammed, T. S. Ghosh, N. K. Singh, and S. S. Mande, “Sphinx - an algorithm for taxonomic binning of metagenomic sequences,” Bioinformatics, vol. 27, no. 1, pp. 22 – 30, January 2011.
R. Ounit, S. Wanamaker, T. J. Close, and S. Lonardi, “Clark: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers,” BMC genomics, vol. 16, no. 1, p. 236, 2015.
Z. Rasheed and H. Rangwala, “A map-reduce framework for clustering metagenomes,” in Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. IEEE, 2013, pp. 549–558.
J. Shendure and H. Ji, “Next-generation dna sequencing,” Nature biotechnology, vol. 26, no. 10, pp. 1135–1145, 2008.
X. Su, J. Xu, and K. Ning, “Parallel-meta: efficient metagenomic data analysis based on highperformance computation,” BMC Systems Biology, vol. 6, no. 1, p. S16, 2012.
H. Teeling and F. O. Gl¨ockner, “Current opportunities and challenges in microbial metagenome analysisa bioinformatic perspective,” Briefings in bioinformatics, vol. 13, no. 6, pp. 728–742, 2012.
G. W. Tyson, J. Chapman, P. Hugenholtz, E. E. Allen, R. J. Ram, P. M. Richardson, V. V. Solovyev, E. M. Rubin, D. S. Rokhsar, and J. F. Banfield, “Community structure and metabolism through reconstruction of microbial genomes from the environment,” Nature, vol. 428, no. 6978, pp. 37–43, 2004.
V. Van Le, L. Van Tran, and H. Van Tran, “A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads,” BMC bioinformatics, vol. 17, no. 22, 2016.
Y. Wang, H. C. M. Leung, S. M. Yiu, and F. Y. L. Chin, “Metacluster-ta: taxonomic annotation for metagenomic databased on assembly-assisted binning,” BMC Genomics, vol. 15, 2014.
X. Yang, J. Zola, and S. Aluru, “Large-scale metagenomic sequence clustering on map-reduce clusters,” Journal of bioinformatics and computational biology, vol. 11, no. 01, p. 1340001, 2013.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.