Graph-based and generative approaches to multi-document summarization

Tam Doan Thanh, Tan Minh Nguyen, Thai Binh Nguyen, Hoang Trung Nguyen, Hai Long Nguyen, Mai Vu Tran, Quang Thuy Ha, Ha Thanh Nguyen
Author affiliations

Authors

  • Tam Doan Thanh Viettel Group, Lane 7, Ton That Thuyet Street, Yen Hoa Ward, Cau Giay District, Ha Noi, Viet Nam
  • Tan Minh Nguyen VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, Ha Noi, Viet Nam
  • Thai Binh Nguyen VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, Ha Noi, Viet Nam
  • Hoang Trung Nguyen VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, Ha Noi, Viet Nam
  • Hai Long Nguyen VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, Ha Noi, Viet Nam
  • Mai Vu Tran VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, Ha Noi, Viet Nam
  • Quang Thuy Ha VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, Ha Noi, Viet Nam
  • Ha Thanh Nguyen National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan

DOI:

https://doi.org/10.15625/1813-9663/18353

Keywords:

Multi-document summarization, abstractive summarization, NLP, graph-based, generative models.

Abstract

Multi-document summarization is a challenging problem in the Natural Language Processing field that has drawn a lot of interest from the research community. In this paper, we propose a two-phase pipeline to tackle the Vietnamese abstractive multi-document summarization task. The initial phase of the pipeline involves an extractive summarization stage including two different systems. The first system employs a hybrid model based on the TextRank algorithm and a text correlation consideration mechanism. The second system is a modified version of SummPip - an unsupervised graph-based method for multi-document summarization. The second phase of the pipeline is abstractive summarization models. Particularly, generative models are applied to produce abstractive summaries from previous phase outputs. The proposed method achieves competitive results as we surpassed many strong research teams to finish the first rank in the AbMusu task - Vietnamese abstractive multi-document summarization, organized in the VLSP 2022 workshop.

Metrics

Metrics Loading ...

References

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.

F. Boudin and E. Morin, “Keyphrase extraction for n-best reranking in multi-sentence compression,” in North American Chapter of the Association for Computational Linguistics (NAACL), 2013.

G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of artificial intelligence research, vol. 22, pp. 457–479, 2004.

A. R. Fabbri, I. Li, T. She, S. Li, and D. R. Radev, “Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model,” arXiv preprint arXiv:1906.01749, 2019.

J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng, “A deep look into neural ranking models for information retrieval,” Information Processing & Management, vol. 57, no. 6, p. 102067, 2020

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

H. Jin, T. Wang, and X. Wan, “Multi-granularity interaction network for extractive and abstractive multi-document summarization,” in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 6244–6254.

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.

C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Text summarization

branches out, 2004, pp. 74–81.

Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer,

“Multilingual denoising pre-training for neural machine translation,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020.

T. Mai-Vu, L. Hoang-Quynh, C. Duy-Cat, and N. Quoc-An, “Vlsp 2022 – abmusu challenge:

Vietnamese abstractive multi-document summarization,” Proceedings of the 9th International

Workshop on Vietnamese Language and Speech Processing (VLSP 2022), 2022.

A. Mammone, M. Turchi, and N. Cristianini, “Support vector machines,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 1, no. 3, pp. 283–289, 2009.

R. Mihalcea and P. Tarau, “Textrank: Bringing order into text,” in Proceedings of the 2004

conference on empirical methods in natural language processing, 2004, pp. 404–411.

A. Mishra, D. Patel, A. Vijayakumar, X. L. Li, P. Kapanipathi, and K. Talamadupula, “Looking

beyond sentence-level natural language inference for question answering and text summarization,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 1322–1336.

N. Moratanch and S. Chitrakala, “A survey on abstractive text summarization,” in 2016 International Conference on Circuit, power and computing technologies (ICCPCT). IEEE, 2016, pp.

–7.

——, “A survey on extractive text summarization,” in 2017 international conference on computer, communication and signal processing (ICCCSP). IEEE, 2017, pp. 1–6.

D. Q. Nguyen, T. Vu, D. Q. Nguyen, M. Dras, and M. Johnson, “From word segmentation

to POS tagging for Vietnamese,” in Proceedings of the Australasian Language Technology

Association Workshop 2017, Brisbane, Australia, Dec. 2017, pp. 108–113. [Online]. Available:

https://aclanthology.org/U17-1013

H.-T. Nguyen, M.-P. Nguyen, T.-H.-Y. Vuong, M.-Q. Bui, M.-C. Nguyen, T.-B. Dang, V. Tran,

L.-M. Nguyen, and K. Satoh, “Transformer-based approaches for legal text processing,” The

Review of Socionetwork Strategies, vol. 16, no. 1, pp. 135–155, 2022.

P. Over and J. Yen, “An introduction to duc-2004,” National Institute of Standards and Technology, 2004.

L. Phan, H. Tran, H. Nguyen, and T. H. Trinh, “Vit5: Pretrained text-to-text transformer for

vietnamese language generation,” arXiv preprint arXiv:2205.06457, 2022

D. Radev, E. Hovy, and K. McKeown, “Introduction to the special issue on summarization,”

Computational linguistics, vol. 28, no. 4, pp. 399–408, 2002.

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu,

“Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal

of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple

way to prevent neural networks from overfitting,” The journal of machine learning research,

vol. 15, no. 1, pp. 1929–1958, 2014.

N. L. Tran, D. M. Le, and D. Q. Nguyen, “Bartpho: Pre-trained sequence-to-sequence models

for vietnamese,” arXiv preprint arXiv:2109.09701, 2021.

S. Tu, J. Yu, F. Zhu, J. Li, L. Hou, and J.-Y. Nie, “UPER: Boosting multi-document

summarization with an unsupervised prompt-based extractor,” in Proceedings of the 29th

International Conference on Computational Linguistics. Gyeongju, Republic of Korea:

International Committee on Computational Linguistics, Oct. 2022, pp. 6315–6326. [Online].

Available: https://aclanthology.org/2022.coling-1.550

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.

T. Vu, D. Q. Nguyen, D. Q. Nguyen, M. Dras, and M. Johnson, “Vncorenlp: A vietnamese

natural language processing toolkit,” arXiv preprint arXiv:1801.01331, 2018.

Y. T.-H. Vuong, Q. M. Bui, H.-T. Nguyen, T.-T.-T. Nguyen, V. Tran, X.-H. Phan, K. Satoh,

and L.-M. Nguyen, “Sm-bert-cr: a deep learning approach for case law retrieval with supporting

model,” Artificial Intelligence and Law, pp. 1–28, 2022.

M. Yasunaga, R. Zhang, K. Meelu, A. Pareek, K. Srinivasan, and D. Radev, “Graph-based

neural multi-document summarization,” in Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 2017, pp. 452–462.

J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “Pegasus: Pre-training with extracted gap-sentences

for abstractive summarization,” in International Conference on Machine Learning. PMLR,

, pp. 11 328–11 339.

J. Zhao, M. Liu, L. Gao, Y. Jin, L. Du, H. Zhao, H. Zhang, and G. Haffari, “Summpip: Unsupervised multi-document summarization with sentence graph compression,” in Proceedings of the 43rd international acm sigir conference on research and development in information retrieval, 2020, pp. 1949–1952.

Downloads

Published

23-08-2024

How to Cite

[1]
T. D. Thanh, “Graph-based and generative approaches to multi-document summarization”, JCC, vol. 40, no. 3, Aug. 2024.

Issue

Section

Articles