Tuan Minh Luu, Huong Thanh Le, Tan Minh Hoang
  • Tuan Minh Luu National Economics University, Hanoi, Vietnam
  • Huong Thanh Le Hanoi University of Science and Technology
  • Tan Minh Hoang Hanoi University of Science and Technology, Hanoi, Vietnam




Extractive Summarization, BERT multilingual, CNN, Encoder-Decoder, TF-IDF feature


Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.


