ESTIMATING AMINO ACID SUBSTITUTION MODELS AND ROOTING BACTERIAL TREES
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/19324Keywords:
Amino acid substitution models, Bacterial protein sequences, Time-non-reversible models, Time-reversible models.Abstract
Reconstructing phylogenetic trees from protein sequences normally requires empirical amino acid substitution models to calculate the likelihood of trees or genetic distances between species. The tree of life is classified into three domains of Eukaryotes, Archaea, and Bacteria. The amino acid substitution models have been intensively studied for decades, but few are related to Bacteria. Rooting bacterial trees remains a challenging problem in the phylogenetic analysis due to the long branch separating Bacteria and other domains. The two main objectives of this paper are estimating amino acid substitution models Q.bac and NQ.bac for bacterial evolutionary studies and assessing the capability of the time non-reversible model NQ.bac in rooting bacterial trees. Experiments showed that both the time-reversible model (Q.bac) and the time-non-reversible model (NQ.bac) were significantly better than the existing models in analyzing bacterial protein sequences. Interestingly, the time non-reversible model NQ.bac helped reconstruct maximum likelihood bacterial trees with reliable roots for 177 (23.7\%) out of 748 testing alignments without requiring predefined outgroups. This outgroup-free rooting method enhances the studies of bacterial evolution. We recommend researchers employ both Q.bac and NQ.bac models in analyzing bacterial protein sequences. The datasets and scripts used in this manuscript are available at https://doi.org/10.6084/m9.figshare.20457264.
Metrics
References
Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol Biol Evol 2017;35:518–22. https://doi.org/10.1093/molbev/msx281. DOI: https://doi.org/10.1093/molbev/msx281
Amparo R Del, Arenas M. Consequences of Substitution Model Selection on Protein Ancestral Sequence Reconstruction. Mol Biol Evol 2022;39. https://doi.org/10.1093/molbev/msac144. DOI: https://doi.org/10.1093/molbev/msac144
Whelan S, Goldman N. A General Empirical Model of Protein Evolution Derived from Multiple Protein Families Using a Maximum-Likelihood Approach. Mol Biol Evol 2001;18:691–9. https://doi.org/10.1093/oxfordjournals.molbev.a003851. DOI: https://doi.org/10.1093/oxfordjournals.molbev.a003851
Le SQ, Gascuel O. An improved general amino acid replacement matrix. Mol Biol Evol 2008;25. https://doi.org/10.1093/molbev/msn067. DOI: https://doi.org/10.1093/molbev/msn067
Minh BQ, Dang CC, Vinh LS, Lanfear R. QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution. Syst Biol 2021;70:1046–60. https://doi.org/10.1093/sysbio/syab010. DOI: https://doi.org/10.1093/sysbio/syab010
Dang CC, Minh BQ, McShea H, Masel J, James JE, Vinh LS, et al. nQMaker: estimating time non-reversible amino acid substitution models. Syst Biol 2022. https://doi.org/10.1101/2021.10.18.464754. DOI: https://doi.org/10.1101/2021.10.18.464754
Huy TN, Dang CC, Vinh LS. Estimating amino acid substitution models from genome datasets: A simulation study on the performance of estimated models. BioRxiv 2023. https://doi.org/10.1101/2023.04.09.536188. DOI: https://doi.org/10.1101/2023.04.09.536188
Maddison WP, Donoghue MJ, Maddison DR. Outgroup analysis and parsimony. Syst Biol 1984;33. https://doi.org/10.1093/sysbio/33.1.83. DOI: https://doi.org/10.2307/2413134
Yang Z, Roberts D. On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol 1995;12:451–8. https://doi.org/10.1093/oxfordjournals.molbev.a040220. DOI: https://doi.org/10.1093/oxfordjournals.molbev.a040220
Huelsenbeck JP, Bollback JP, Levine AM. Inferring the root of a phylogenetic tree. Syst Biol 2002;51. https://doi.org/10.1080/106351502753475862. DOI: https://doi.org/10.1080/106351502753475862
Ho SYW, Duchêne S. Molecular-clock methods for estimating evolutionary rates and timescales. Mol Ecol 2014;23:5947–65. https://doi.org/https://doi.org/10.1111/mec.12953. DOI: https://doi.org/10.1111/mec.12953
Bettisworth B, Stamatakis A. Root Digger: a root placement program for phylogenetic trees. BMC Bioinformatics 2021;22. https://doi.org/10.1186/s12859-021-03956-5. DOI: https://doi.org/10.1186/s12859-021-03956-5
Naser-Khdour S, Quang Minh B, Lanfear R. Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Nonreversible Models for Mammals. Syst Biol 2022;71. https://doi.org/10.1093/sysbio/syab067. DOI: https://doi.org/10.1093/sysbio/syab067
Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A 1989;86. https://doi.org/10.1073/pnas.86.23.9355. DOI: https://doi.org/10.1073/pnas.86.23.9355
Lake JA, Herbold CW, Rivera MC, Servin JA, Skophammer RG. Rooting the tree of life using nonubiquitous genes. Mol Biol Evol 2007;24. https://doi.org/10.1093/molbev/msl140. DOI: https://doi.org/10.1093/molbev/msl140
Tria FDK, Landan G, Dagan T. Phylogenetic rooting using minimal ancestor deviation. Nat Ecol Evol 2017;1:193. https://doi.org/10.1038/s41559-017-0193. DOI: https://doi.org/10.1038/s41559-017-0193
Mai U, Sayyari E, Mirarab S. Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction. PLoS One 2017;12:e0182238. DOI: https://doi.org/10.1371/journal.pone.0182238
Coleman GA, Davín AA, Mahendrarajah TA, Szánthó LL, Spang A, Hugenholtz P, et al. A rooted phylogeny resolves early bacterial evolution. Science (1979) 2021;372:eabe0511. https://doi.org/10.1126/science.abe0511. DOI: https://doi.org/10.1126/science.abe0511
Lima T, Auchincloss AH, Coudert E, Keller G, Michoud K, Rivoire C, et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res 2008;37:D471–8. https://doi.org/10.1093/nar/gkn661. DOI: https://doi.org/10.1093/nar/gkn661
Yang Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 1993;10. https://doi.org/10.1093/oxfordjournals.molbev.a040082. DOI: https://doi.org/10.1093/oxfordjournals.molbev.a040082
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 2017;14:587–9. https://doi.org/10.1038/nmeth.4285. DOI: https://doi.org/10.1038/nmeth.4285
Dang CC ao, Le VS y., Gascuel O, Hazes B, Le QS i. FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets. BMC Bioinformatics 2014;15. https://doi.org/10.1186/1471-2105-15-341. DOI: https://doi.org/10.1186/1471-2105-15-341
Schwarz G. Estimating the Dimension of a Model. The Annals of Statistics 1978;6:461 – 464. https://doi.org/10.1214/aos/1176344136. DOI: https://doi.org/10.1214/aos/1176344136
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol 2020;37. https://doi.org/10.1093/molbev/msaa015. DOI: https://doi.org/10.1101/849372
Felsenstein J. Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 1981;17:368–76. https://doi.org/10.1007/BF01734359. DOI: https://doi.org/10.1007/BF01734359
Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974;19:716–23. https://doi.org/10.1109/TAC.1974.1100705. DOI: https://doi.org/10.1109/TAC.1974.1100705
Le VS, Dang CC, Le QS. Improved mitochondrial amino acid substitution models for metazoan evolutionary studies. BMC Evol Biol 2017;17:136. https://doi.org/10.1186/s12862-017-0987-y. DOI: https://doi.org/10.1186/s12862-017-0987-y
Kishino H, Hasegawa M. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 1989;29:170–9. DOI: https://doi.org/10.1007/BF02100115
Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci 1981;53:131–47. https://doi.org/https://doi.org/10.1016/0025-5564(81)90043-2. DOI: https://doi.org/10.1016/0025-5564(81)90043-2
Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Syst Biol 2002;51. https://doi.org/10.1080/10635150290069913. DOI: https://doi.org/10.1080/10635150290069913
Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 1992;89:10915–9. https://doi.org/10.1073/pnas.89.22.10915. DOI: https://doi.org/10.1073/pnas.89.22.10915
Dayhoff M, Schwartz R, Orcutt B. A model of evolutionary change in proteins. vol. 5. National Biomedical Research Foundation; 1978.
Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Bioinformatics 1992;8:275–82. https://doi.org/10.1093/bioinformatics/8.3.275. DOI: https://doi.org/10.1093/bioinformatics/8.3.275
Veerassamy S, Smith A, Tillier ERM. A transition probability model for amino acid substitutions from blocks. J Comput Biol 2003;10:997–1010. DOI: https://doi.org/10.1089/106652703322756195
Müller T, Vingron M. Modeling amino acid replacement. J Comput Biol 2000;7:761–76. DOI: https://doi.org/10.1089/10665270050514918
Abascal F, Posada D, Zardoya R. MtArt: A New Model of Amino Acid Replacement for Arthropoda. Mol Biol Evol 2006;24:1–5. https://doi.org/10.1093/molbev/msl136. DOI: https://doi.org/10.1093/molbev/msl136
Yang Z, Nielsen R, Hasegawa M. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol 1998;15:1600–11. https://doi.org/10.1093/oxfordjournals.molbev.a025888. DOI: https://doi.org/10.1093/oxfordjournals.molbev.a025888
Adachi J, Hasegawa M. Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 1996;42:459–68. https://doi.org/10.1007/BF02498640. DOI: https://doi.org/10.1007/PL00013324
Rota-Stabelli O, Yang Z, Telford MJ. MtZoa: A general mitochondrial amino acid substitutions model for animal evolutionary studies. Mol Phylogenet Evol 2009;52:268–72. https://doi.org/https://doi.org/10.1016/j.ympev.2009.01.011. DOI: https://doi.org/10.1016/j.ympev.2009.01.011
Dang CC, Le QS, Gascuel O, Le VS. FLU, an amino acid substitution model for influenza proteins. BMC Evol Biol 2010;10:99. https://doi.org/10.1186/1471-2148-10-99. DOI: https://doi.org/10.1186/1471-2148-10-99
Le TK, Vinh LS. FLAVI: An Amino Acid Substitution Model for Flaviviruses. J Mol Evol 2020;88:445–52. https://doi.org/10.1007/s00239-020-09943-3. DOI: https://doi.org/10.1007/s00239-020-09943-3
Dimmic MW, Rest JS, Mindell DP, Goldstein RA. rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase Phylogeny. J Mol Evol 2002;55:65–73. https://doi.org/10.1007/s00239-001-2304-y. DOI: https://doi.org/10.1007/s00239-001-2304-y
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.