Thinh, N. V., Lang, T. V. and Thanh, V. T. (2025) “RGTranCNet: Effective image captioning model using cross-attention and semantic knowledge”, Vietnam Journal of Science and Technology. Hanoi, VN. doi: 10.15625/2525-2518/22381.