Phrasal semantic distance for vietnamese textual document retrieval

Tuyen Thi-Thanh Do, Dang Tuan Nguyen


In this paper, a computational semantic method is proposed to estimate the phrasal semantic distance used in our model of a Vietnamese document retrieval system. The semantic distances between phrases are defined in terms of semantic classes and semantic relations to ensure that it can reflect how different two certain phrases are. To estimate the semantic distance, the semantic classes of a phase are identified by using the n-gram model. After identification of the semantic classes, their semantic relations are also identified by using a Vietnamese Lexicon Ontology. This handcrafted ontology contains defined semantic classes and their potential relations in Vietnamese language explicitly. For the evaluation purpose, a phrasal semantic retrieval system has been built to test with a data set of 720 phrases and 30 queries. The evaluation shows the precision of 96.6% and the recall of 78.4% on experimental results.


Lexicon ontology, phrasal semantic analysis, semantic class, semantic distance, semantic information retrieval.

Full Text:



Journal of Computer Science and Cybernetics ISSN: 1813-9663

Published by Vietnam Academy of Science and Technology