Phrasal semantic distance for vietnamese textual document retrieval

Tuyen Thi-Thanh Do, Dang Tuan Nguyen
Author affiliations

Authors

  • Tuyen Thi-Thanh Do University of Information Technology, VNU-HCM
  • Dang Tuan Nguyen University of Information Technology, VNU-HCM

DOI:

https://doi.org/10.15625/1813-9663/31/3/5923

Keywords:

Lexicon ontology, phrasal semantic analysis, semantic class, semantic distance, semantic information retrieval.

Abstract

In this paper, a computational semantic method is proposed to estimate the phrasal semantic distance used in our model of a Vietnamese document retrieval system. The semantic distances between phrases are defined in terms of semantic classes and semantic relations to ensure that it can reflect how different two certain phrases are. To estimate the semantic distance, the semantic classes of a phase are identified by using the n-gram model. After identification of the semantic classes, their semantic relations are also identified by using a Vietnamese Lexicon Ontology. This handcrafted ontology contains defined semantic classes and their potential relations in Vietnamese language explicitly. For the evaluation purpose, a phrasal semantic retrieval system has been built to test with a data set of 720 phrases and 30 queries. The evaluation shows the precision of 96.6% and the recall of 78.4% on experimental results.

Metrics

Metrics Loading ...

Downloads

Published

14-09-2015

How to Cite

[1]
T. T.-T. Do and D. T. Nguyen, “Phrasal semantic distance for vietnamese textual document retrieval”, JCC, vol. 31, no. 3, p. 185, Sep. 2015.

Issue

Section

Computer Science