(1)Thinh, N. V.; Lang, T. V.; Thanh, V. T. RGTranCNet: Effective Image Captioning Model Using Cross-Attention and Semantic Knowledge. Vietnam J. Sci. Technol. 2025, 64, 123–138.