COMOM - VLSP 2023 Two-stage framework for identifying and extracting Vietnamese comparative opinion quintuple extraction
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/20607Keywords:
Comparative mining, Vietnamese language, two stage frameworks, multi-task prompting tuning.Abstract
Comparative opinion mining is an important subtask of opinion mining. It aims to identify comparative reviews and extract the comparative elements in quintuples. This task, called Comparative Opinion Quintuple Extraction (COQE), has two main sub-tasks: Comparative Sentence Identification (CSI) and Comparative Element Extraction (CEE). In this paper, we introduce an effective two-stage framework for the COQE task specifically designed for the Vietnamese language. The first stage leverages the power of fine-tuning different BERT-based language models to identify the comparative sentences. We then formulate the comparison extraction task as a conditional text generation problem and apply a multi-task instruction prompting architecture based on generative language models. Furthermore, we also employ a data augmentation technique to increase the training data samples. Our experimental results on the VCOM dataset [15] demonstrate that our framework outperforms existing methods and achieves state-of-the-art performance on the test set. We also conduct a detailed analysis to provide insights for future research on this topic.
Metrics
References
N. X. Bach, Pham, D. Van, N. D. Tai, and T. M. Phuong, “Mining vietnamese comparative sentences for sentiment analysis,” in 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), 2015, pp. 162–167.
L. Bing, Sentiment Analysis and Opinion Mining. Springer Nature Switzerland AG 2012, 2012.
T. V. Bui, T. O. Tran, and P. Le-Hong, “Improving sequence tagging for Vietnamese text using transformer-based neural models,” in Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, M. L. Nguyen, M. C. Luong, and S. Song, Eds., Oct. 2020, pp. 13–20.
Z. Chi, L. Dong, F. Wei, N. Yang, S. Singhal, W. Wang, X. Song, X.-L. Mao, H. Huang, and M. Zhou, “InfoXLM: An information-theoretic framework for cross-lingual language model pre-training,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds., Jun. 2021, pp. 3576–3588.
Z. Chi, L. Dong, B. Zheng, S. Huang, X.-L. Mao, H. Huang, and F. Wei, “Improving pretrained cross-lingual language models via self-labeled word alignment,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Aug. 2021, pp. 3418–3430.
H. W. Chung, L. Hou, S. Longpre, D. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahmkshatriya, A. Wilson, S. G. Iyer, Z. Dai, M. Sugan, X. Chen, A. Choudhery, A. Constant, D. Shao, H. Pella, K. Robinson, H. Dalter, S. Narang, G. Mishra, A. Yu, Y. Zhao, Y. Huang, A. Dai, M. Yust, P. Petrov, E. Chi, V. Jain, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei, "Scaling text-to-knowledge neural language models," 2022.
A. Conneau, K. Hadifarnewl, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, "Unsupervised cross-lingual representation learning at scale," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Jul. 2020, pp. 8440-8451.
V. T. Dang, D. Ho, D. Ngo, Nguyen, and N. T. N. Luu-Thy, "Vietnamese sentiment analysis: An overview and comparative study of fine-tuning pretrained language models," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 6, Jun 2023.
V. T. Dang, D. N. Hao, and N. L. T. Nguyen, "A systematic literature review on vietnamese aspect-based sentiment analysis," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 8, Aug 2023.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, J. Burstein, C. Doran, and T. Solorio, Eds., Jun 2019, pp. 4171-4186.
F. Gao, Y. Liu, W. Fu, M. Zhang, A. Ballard, and L. Zhao, "End-to-end comparative opinion quantile extraction as bipartite set prediction with dynamic structure pruning," Expert Systems with Applications, 2023.
L. Ha, B. Tran, P. Le, T. Nguyen, D. Nguyen, N. Pham, and D. Huynh, "Unveiling comparative sentiments in vietnamese product reviews: A sequential classification framework," arXiv preprint arXiv:2001.0108, 2024.
P. He, J. Gao, and W. Chen, "DebertaV3: Improving deberta using electro-style pre-training with gradient-disentangled embedding sharing," in The Eleventh International Conference on Learning Representations, 2022.
N. Jindal and B. Liu, "Identifying comparative sentences in text documents," in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR '06. New York, NY, USA: Association for Computing Machinery, 2006, pp. 244-251.
H.-Q. Le, D.-C. Can, K.-V. Nguyen, and M.-V. Tran, "Overview of the vlsp 2023 - common shared task: A data challenge for comparative opinion mining from vietnamese product reviews," 2024.
L. Liu, R. Xia, and J. Yu, "Comparative opinion quintuple extraction from product reviews," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-T. Yih, Eds., Nov. 2021, pp. 3955-3965.
N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari, S. Shen, Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai, A. Webson, E. Raff, and C. Raffel, “Crosslingual generalization through multitask finetuning,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Jul. 2023, pp. 15 991–16 111.
D. Q. Nguyen and N. A. Tuan, “PhoBERT: Pre-trained language models for Vietnamese,” in Proceedings of the Association for Computational Linguistics: EMNLP 2020, Y. He, and J. Y. Lin, Eds., Online, Nov. 2020, pp. 1037-1042.
J. Niin, and L. Bing, “Mining comparative sentences and relations,” in Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2, ser. AAAI'06. AAAI Press, 2006, p. 1331-1336.
L. Pham, H. Tran, H. Nguyen, and T. H. Trinh, “ViT5: Pretrained text-to-text transformer for Vietnamese language generation,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Student Research Workshop, J. Pizzato, L. H. M. L. Phuoc, D. Chen, and T. X. Julie, Eds., 2022, pp. 136-142.
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485-5510, 2020.
V. Sanh, A. Webson, C. Raffel, S. Bach, L. Stuiwikis, Z. Alyafeai, A. Chaffin, A. Stieler, A. Raja, M. Dehghany, “Multi-task prompted training enables zero-shot task generalization,” in International Conference on Learning Representations, 2021.
C. D. Tran, N. H. Pham, A. T. Nguyen, T. S. Ho, and T. Vu, “ViBEBERT: A powerful pre-trained language model for Vietnamese,” in Findings of the Association for Computational Linguistics: ENACL 2023, A. Velichaus and J. Augustein, Eds., May 2023, pp. 107-118.
N. L. Tran, D. M. Le, and D. Q. Nguyen, “Bartpho: pre-trained sequence-to-sequence models for Vietnamese,” arXiv preprint arXiv:2109.09701, 2021.
K. D. Varathan, A. Giasihanou, and F. A. Crestani, “Comparative opinion mining: A review,” Journal of the Association for Information Science and Technology, vol. 68, 2017.
T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, L. Rofr, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, D. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds., Oct. 2020, pp. 38-45.
Q. Xu, Y. Hong, F. Zhao, K. Song, J. Chen, Y. Kang, and G. Zhou, “Gen-based end-to-end model for comparative opinion mining: unilexicon 2023 international joint conference on neural networks (IJCNN), pp. 1, 612, 2023.
Q. Xu, Y. Hong, F. Zhao, K. Song, Y. Kang, J. Chen, and G. Zhou, “Low-resource comparative opinion quintuple extraction by data augmentation with prompting,” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bourlard, J. Juno, and K. Bali, Eds., Dec. 2023, pp. 3892-3897.
L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel,“mT5: A massively multilingual pre-trained text-to-text transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds., Online, Jun. 2021, pp.483–498.
Z. Yang, F. Xu, J. Yu, and R. Xia, “UniCOQE: Unified comparative opinion quintuple extraction as a set,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Jul. 2023, pp. 12 229–12 240.
W. Zhang, X. Li, Y. Deng, L. Bing, and W. Lam, “Towards generative aspect-based sentiment analysis,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Aug. 2021, pp. 504–510.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.