A novel \(l\)-mer counting method abundance-based binning of metagenomic reads

Le Van Vinh; Tran Van Lang; Tran Van Hoai

doi:10.15625/1813-9663/30/3/3754

A novel \(l\)-mer counting method abundance-based binning of metagenomic reads

Le Van Vinh, Tran Van Lang, Tran Van Hoai

Author affiliations

Authors

Le Van Vinh Faculty of Computer Science and Engineering, HCMC University of Technology, Vietnam
Tran Van Lang Viện Cơ học và Tin học ứng dụng, Viện Khoa học và Công nghệ Việt Nam
Tran Van Hoai Faculty of Computer Science and Engineering, HCMC University of Technology, Vietnam

DOI:

https://doi.org/10.15625/1813-9663/30/3/3754

Keywords:

Metagenomics, binning, \(l\)-mer counting, DNA sequence, next-generation sequencing

Abstract

The binning of reads is a crucial step in metagenomic data analysis. While unsupervised methods which are based on composition features are only efficient for long reads, genome abundance-based methods are often used in the binning of short reads. Previous abundance-based binning approaches usually use fixed-length \(l\)-mer frequencies to separate reads into groups such that reads in each group belong to genomes (or species) of very similar abundances. However, their classification performances are very sensitive to the length of \(l\)-mers, and they get difficult to separate reads from low-abundance genomes due to the repeat of short length \(l\)-mers in the genomes. In this paper, a new variable-length \(l\)-mer counting method is proposed to enable dealing with the short length \(l\)-mer repetition for improving the accuracy of abundance-based binning approaches. Computational experiments demonstrate that an improved approach of AbundanceBin (a commonly used binning method) in which the proposed method is applied achieves higher accuracy than the original one. The software implementing the approach can be downloaded at http://fit.hcmute.edu.vn/bioinfo/MetaSeqBin/index.htm.

Downloads

Published

24-09-2014

How to Cite

[1]L. V. Vinh, T. V. Lang, and T. V. Hoai, “A novel \(l\)-mer counting method abundance-based binning of metagenomic reads”, J. Comput. Sci. Cybern., vol. 30, no. 3, pp. 267–277, Sep. 2014.

Download Citation

Issue

Vol. 30 No. 3 (2014)

Section

Computer Science

License

1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.