A FORMULA TO CALCULATE PRUNING THRESHOLD FOR THE PART-OF-SPEECH TAGGING PROBLEM
Keywords:Hidden Markov model, Part-of-speech tagging, Viterbi algorithm, Beam search.
The exact tagging of the words in the texts is a very important task in the natural language processing. It can support parsing the text, contribute to the solution of the polysemous word, and help to access a semantic information, etc. One of crucial factors in the POS (Part-of-Speech) tagging approaches based on the statistical method is the processing time. In this paper, we propose an approach to calculate the pruning threshold, which can apply into the Viterbi algorithm of Hidden Markov model for tagging the texts in the natural language processing. Experiment on the 1.000.000 words on the tag of the Wall Street Journal corpus showed that our proposed solution is satisfactory.
How to Cite
Authors who publish with Vietnam Journal of Science and Technology agree with the following terms:
- The manuscript is not under consideration for publication elsewhere. When a manuscript is accepted for publication, the author agrees to automatic transfer of the copyright to the editorial office.
- The manuscript should not be published elsewhere in any language without the consent of the copyright holders. Authors have the right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of their work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are encouraged to post their work online (e.g., in institutional repositories or on their websites) prior to or during the submission process, as it can lead to productive exchanges or/and greater number of citation to the to-be-published work (See The Effect of Open Access).