Thinh, N. V., Lang, T. V. and Thanh, V. T. (2025) “RGTranCNet: Effective image captioning model using cross-attention and semantic knowledge”, Vietnam Journal of Science and Technology. Hanoi, VN, 64(1), pp. 123–138. doi: 10.15625/2525-2518/22381.