Nguyen Van Thinh, Lang, T. V., & Van, V. T. T. (2024). OD-VR-Cap: Image captioning based on detecting and predicting relationships between objects.
Journal of Computer Science and Cybernetics
,
40
(4), 327–346. https://doi.org/10.15625/1813-9663/20929