Nguyen Van Thinh, Lang, T. V., & Van, V. T. T. (2024). OD-VR-Cap: Image captioning based on detecting and predicting relationships between objects. Journal of Computer Science and Cybernetics, 40(4), 327–346. https://doi.org/10.15625/1813-9663/20929