[1]Thinh, N.V. et al. 2025. RGTranCNet: Effective image captioning model using cross-attention and semantic knowledge. Vietnam Journal of Science and Technology. (Jul. 2025). DOI:https://doi.org/10.15625/2525-2518/22381.