COMOM - VLSP 2023 Two-stage framework for identifying and extracting Vietnamese comparative opinion quintuple extraction

Dang Van Thin, Nguyen Thi Thuy, Duong Ngoc Hao, Ngan Luu - Thuy Nguyen
Author affiliations

Authors

  • Dang Van Thin University of Information Technology - VNUHCM
  • Nguyen Thi Thuy VNU-HCM University of Information Technology, Quarter 6, Linh Trung Ward, Thu Duc City, Ho Chi Minh City, Viet Nam
  • Duong Ngoc Hao VNU-HCM University of Information Technology, Quarter 6, Linh Trung Ward, Thu Duc City, Ho Chi Minh City, Viet Nam
  • Ngan Luu - Thuy Nguyen VNU-HCM University of Information Technology, Quarter 6, Linh Trung Ward, Thu Duc City, Ho Chi Minh City, Viet Nam

DOI:

https://doi.org/10.15625/1813-9663/20607

Keywords:

Comparative mining, Vietnamese language, two stage frameworks, multi-task prompting tuning.

Abstract

Comparative opinion mining is an important subtask of opinion mining. It aims to identify comparative reviews and extract the comparative elements in quintuples. This task, called Comparative Opinion Quintuple Extraction (COQE), has two main sub-tasks: Comparative Sentence Identification (CSI) and Comparative Element Extraction (CEE). In this paper, we introduce an effective two-stage framework for the COQE task specifically designed for the Vietnamese language. The first stage leverages the power of fine-tuning different BERT-based language models to identify the comparative sentences. We then formulate the comparison extraction task as a conditional text generation problem and apply a multi-task instruction prompting architecture based on generative language models. Furthermore, we also employ a data augmentation technique to increase the training data samples. Our experimental results on the VCOM dataset [15] demonstrate that our framework outperforms existing methods and achieves state-of-the-art performance on the test set. We also conduct a detailed analysis to provide insights for future research on this topic.

Metrics

PDF views
11

References

N. X. Bach, Pham, D. Van, N. D. Tai, and T. M. Phuong, “Mining vietnamese comparative sentences for sentiment analysis,” in 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), 2015, pp. 162–167.

L. Bing, Sentiment Analysis and Opinion Mining. Springer Nature Switzerland AG 2012, 2012.

T. V. Bui, T. O. Tran, and P. Le-Hong, “Improving sequence tagging for Vietnamese text using transformer-based neural models,” in Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, M. L. Nguyen, M. C. Luong, and S. Song, Eds., Oct. 2020, pp. 13–20.

Z. Chi, L. Dong, F. Wei, N. Yang, S. Singhal, W. Wang, X. Song, X.-L. Mao, H. Huang, and M. Zhou, “InfoXLM: An information-theoretic framework for cross-lingual language model pre-training,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds., Jun. 2021, pp. 3576–3588.

Z. Chi, L. Dong, B. Zheng, S. Huang, X.-L. Mao, H. Huang, and F. Wei, “Improving pretrained cross-lingual language models via self-labeled word alignment,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Aug. 2021, pp. 3418–3430.

H. W. Chung, L. Hou, S. Longpre, D. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahmkshatriya, A. Wilson, S. G. Iyer, Z. Dai, M. Sugan, X. Chen, A. Choudhery, A. Constant, D. Shao, H. Pella, K. Robinson, H. Dalter, S. Narang, G. Mishra, A. Yu, Y. Zhao, Y. Huang, A. Dai, M. Yust, P. Petrov, E. Chi, V. Jain, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei, "Scaling text-to-knowledge neural language models," 2022.

A. Conneau, K. Hadifarnewl, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, "Unsupervised cross-lingual representation learning at scale," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Jul. 2020, pp. 8440-8451.

V. T. Dang, D. Ho, D. Ngo, Nguyen, and N. T. N. Luu-Thy, "Vietnamese sentiment analysis: An overview and comparative study of fine-tuning pretrained language models," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 6, Jun 2023.

V. T. Dang, D. N. Hao, and N. L. T. Nguyen, "A systematic literature review on vietnamese aspect-based sentiment analysis," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 8, Aug 2023.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, J. Burstein, C. Doran, and T. Solorio, Eds., Jun 2019, pp. 4171-4186.

F. Gao, Y. Liu, W. Fu, M. Zhang, A. Ballard, and L. Zhao, "End-to-end comparative opinion quantile extraction as bipartite set prediction with dynamic structure pruning," Expert Systems with Applications, 2023.

L. Ha, B. Tran, P. Le, T. Nguyen, D. Nguyen, N. Pham, and D. Huynh, "Unveiling comparative sentiments in vietnamese product reviews: A sequential classification framework," arXiv preprint arXiv:2001.0108, 2024.

P. He, J. Gao, and W. Chen, "DebertaV3: Improving deberta using electro-style pre-training with gradient-disentangled embedding sharing," in The Eleventh International Conference on Learning Representations, 2022.

N. Jindal and B. Liu, "Identifying comparative sentences in text documents," in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR '06. New York, NY, USA: Association for Computing Machinery, 2006, pp. 244-251.

H.-Q. Le, D.-C. Can, K.-V. Nguyen, and M.-V. Tran, "Overview of the vlsp 2023 - common shared task: A data challenge for comparative opinion mining from vietnamese product reviews," 2024.

L. Liu, R. Xia, and J. Yu, "Comparative opinion quintuple extraction from product reviews," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-T. Yih, Eds., Nov. 2021, pp. 3955-3965.

N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari, S. Shen, Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai, A. Webson, E. Raff, and C. Raffel, “Crosslingual generalization through multitask finetuning,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Jul. 2023, pp. 15 991–16 111.

D. Q. Nguyen and N. A. Tuan, “PhoBERT: Pre-trained language models for Vietnamese,” in Proceedings of the Association for Computational Linguistics: EMNLP 2020, Y. He, and J. Y. Lin, Eds., Online, Nov. 2020, pp. 1037-1042.

J. Niin, and L. Bing, “Mining comparative sentences and relations,” in Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2, ser. AAAI'06. AAAI Press, 2006, p. 1331-1336.

L. Pham, H. Tran, H. Nguyen, and T. H. Trinh, “ViT5: Pretrained text-to-text transformer for Vietnamese language generation,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Student Research Workshop, J. Pizzato, L. H. M. L. Phuoc, D. Chen, and T. X. Julie, Eds., 2022, pp. 136-142.

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485-5510, 2020.

V. Sanh, A. Webson, C. Raffel, S. Bach, L. Stuiwikis, Z. Alyafeai, A. Chaffin, A. Stieler, A. Raja, M. Dehghany, “Multi-task prompted training enables zero-shot task generalization,” in International Conference on Learning Representations, 2021.

C. D. Tran, N. H. Pham, A. T. Nguyen, T. S. Ho, and T. Vu, “ViBEBERT: A powerful pre-trained language model for Vietnamese,” in Findings of the Association for Computational Linguistics: ENACL 2023, A. Velichaus and J. Augustein, Eds., May 2023, pp. 107-118.

N. L. Tran, D. M. Le, and D. Q. Nguyen, “Bartpho: pre-trained sequence-to-sequence models for Vietnamese,” arXiv preprint arXiv:2109.09701, 2021.

K. D. Varathan, A. Giasihanou, and F. A. Crestani, “Comparative opinion mining: A review,” Journal of the Association for Information Science and Technology, vol. 68, 2017.

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, L. Rofr, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, D. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds., Oct. 2020, pp. 38-45.

Q. Xu, Y. Hong, F. Zhao, K. Song, J. Chen, Y. Kang, and G. Zhou, “Gen-based end-to-end model for comparative opinion mining: unilexicon 2023 international joint conference on neural networks (IJCNN), pp. 1, 612, 2023.

Q. Xu, Y. Hong, F. Zhao, K. Song, Y. Kang, J. Chen, and G. Zhou, “Low-resource comparative opinion quintuple extraction by data augmentation with prompting,” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bourlard, J. Juno, and K. Bali, Eds., Dec. 2023, pp. 3892-3897.

L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel,“mT5: A massively multilingual pre-trained text-to-text transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds., Online, Jun. 2021, pp.483–498.

Z. Yang, F. Xu, J. Yu, and R. Xia, “UniCOQE: Unified comparative opinion quintuple extraction as a set,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Jul. 2023, pp. 12 229–12 240.

W. Zhang, X. Li, Y. Deng, L. Bing, and W. Lam, “Towards generative aspect-based sentiment analysis,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Aug. 2021, pp. 504–510.

Downloads

Published

30-03-2025

How to Cite

[1]
D. V. Thin, N. T. Thuy, D. N. Hao, and N. L.-T. Ngan, “COMOM - VLSP 2023 Two-stage framework for identifying and extracting Vietnamese comparative opinion quintuple extraction”, J. Comput. Sci. Cybern., no. 1, Mar. 2025.

Issue

Section

Articles

Most read articles by the same author(s)