CLW_SUMO: A hybrid deep learning model for predicting protein SUMOylation sites
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/19626Keywords:
SUMOylation, prediction, convolutional neural networks, long short-term memory, natural language processing, Word2Vec.Abstract
Protein SUMOylation is one of the most important post-translational modifications in Eukaryotes species and plays significant roles in many biological processes. The mechanism underlined the SUMOylation process will be an important cause leading to many common serious diseases, such as breast cancer, cardiac, Parkinson’s, Alzheimer’s disease, etc. Due to the very important roles regulated by SUMOylation, the demand for an in-depth understanding of SUMOylation and its mechanism is currently a hot topic that interests many scientists. In this study, we propose a novel approach, called CLW-SUMO, for predicting SUMOylation sites using a hybrid deep learning model that combines convolutional neural networks (CNN) and long short-term memory (LSTM), using Word2Vec as the word embedding technique. The 10-fold cross-validation demonstrates that our proposed model achieves the best performance with an accuracy of 82.33%, MCC of 0.589 and AUC of 0.829. Besides, the independent testing also shows that our proposed model obtains the highest performance, reaching an accuracy of 90.03%, MCC of 0.773 and AUC of 0.889. Furthermore, when compared to several existing predictors of SUMOylation using an independent dataset, our proposed model exhibits the highest performance with an ACC value of 90.03% and an MCC value of 0.773. We hope that our findings will provide effective suggestions and greatly help researchers in their studies related to protein SUMOylation identification.
Metrics
References
Geiss-Friedlander, R. and F. Melchior, Concepts in sumoylation: a decade on. Nat Rev Mol Cell Biol, 2007. 8(12): p. 947-56.
Hay, R.T., SUMO: a history of modification. Mol Cell, 2005. 18(1): p. 1-12.
Muller, S., et al., SUMO, ubiquitin's mysterious cousin. Nat Rev Mol Cell Biol, 2001. 2(3): p. 202-10.
Zhao, Q., et al., GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs. Nucleic acids research, 2014. 42(W1): p. W325-W330.
Xue, Y., et al., SUMOsp: a web server for sumoylation site prediction. Nucleic acids research, 2006. 34(suppl_2): p. W254-W257.
Ren, J., et al., Systematic study of protein sumoylation: Development of a site‐specific predictor of SUMOsp 2.0. Proteomics, 2009. 9(12): p. 3409-3412.
Jia, J., et al., pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics, 2016. 32(20): p. 3133-3141.
Qian, Y., et al., SUMO-Forest: a cascade forest based method for the prediction of SUMOylation sites on imbalanced data. Gene, 2020. 741: p. 144536.
Lopez, Y., et al., C-iSUMO: a sumoylation site predictor that incorporates intrinsic characteristics of amino acid sequences. Computational Biology and Chemistry, 2020. 87: p. 107235.
Zhu, Y., et al., ResSUMO: A deep learning architecture based on residual structure for prediction of lysine SUMOylation sites. Cells, 2022. 11(17): p. 2646.
Lv, H., et al., DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Briefings in Bioinformatics, 2021. 22(6): p. bbab244.
Sharma, A., et al., HseSUMO: Sumoylation site prediction using half-sphere exposures of amino acids residues. BMC genomics, 2019. 19(9): p. 1-7.
Beauclair, G., et al., JASSA: a comprehensive tool for prediction of SUMOylation sites and SIMs. Bioinformatics, 2015. 31(21): p. 3483-3491.
Chen, Y.-Z., et al., SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties. PloS one, 2012. 7(6): p. e39195.
Lu, C.-T., et al., DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic acids research, 2013. 41(D1): p. D295-D305.
Teng, S., H. Luo, and L. Wang, Predicting protein sumoylation sites from sequence features. Amino acids, 2012. 43: p. 447-455.
Nguyen, V.-N., et al. Exploiting two-layer support vector machine to predict protein sumoylation sites. in Advances in Engineering Research and Application: Proceedings of the International Conference, ICERA 2018. 2019. Springer.
Nguyen, V.-N., et al. Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities. in BMC bioinformatics. 2015. BioMed Central.
Nguyen, V.-N., et al., A new scheme to characterize and identify protein ubiquitination sites. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016. 14(2): p. 393-403.
Nguyen, V.-N., H.-M. Nguyen, and T.-X. Tran, An approach by exploiting support vector machine to characterize and identify protein SUMOylation sites. JASSA. 505: p. 877.
Tran, T.-X., V.-N. Nguyen, and N.Q.K. Le. Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites. in Conference on Information Technology and its Applications. 2023. Springer.
Kao, H.J., et al., SuccSite: Incorporating Amino Acid Composition and Informative k-spaced Amino Acid Pairs to Identify Protein Succinylation Sites. Genomics Proteomics Bioinformatics, 2020. 18(2): p. 208-219.
Huang, Y., et al., CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010. 26(5): p. 680-682.
Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv preprint arXiv:1301.3781, 2019. 10.
Fu, H., et al., DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC bioinformatics, 2019. 20(1): p. 1-10.
Crooks, G.E., et al., WebLogo: a sequence logo generator. Genome research, 2004. 14(6): p. 1188-1190.
Vacic, V., L.M. Iakoucheva, and P. Radivojac, Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics, 2006. 22(12): p. 1536-7.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.