Open Access Open Access  Restricted Access Subscription Access

A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign

Minh Quang Nhat Pham

Abstract


In this report, we describe our participant named-entity recognition system at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling problem using BIO encoding scheme. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features. We compare several methods to deal with nested entities in the dataset. We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition.

Keywords


Nested named-entity recognition, CRF, VLSP

Full Text:

PDF

References


P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, “Class-based n-gram models of natural language,” Comput. Linguist., vol. 18, no. 4, pp. 467–479, Dec. 1992. [Online]. Available: http://dl.acm.org/citation.cfm?id=176313.176316

N. T. M. Huyen and V. X. Luong, “Vlsp 2016 shared task: Named entity recognition,” in Proceedings of Vietnamese Speech and Language Processing (VLSP), 2016.

J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in ICML, 2001, pp. 282–289.

P. Le-Hong, Q. N. M. Pham, T.-H. Pham, T.-A. Tran, and D.-M. Nguyen, “An empirical study of discriminative sequence labeling models for vietnamese text processing,” in Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017), 2017.

P. Liang, “Semi-supervised learning for natural language,” Ph.D. dissertation, Massachusetts Institute of Technology, 2005.

X. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2016, pp. 1064–1074.

P. Q. N. Minh, “A feature-rich vietnamese named-entity recognition model,” arXiv preprint arXiv:1803.04375, 2018.

D. Q. Nguyen, D. Q. Nguyen, T. Vu, M. Dras, andM. Johnson, “A Fast and Accurate Vietnamese Word Segmenter,” in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), 2018.

J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. [Online]. Available: http://www.aclweb.org/anthology/D14-1162

E. F. T. K. Sang, “Introduction to the conll-2002 shared task: Language-independent named entity recognition,” CoRR, vol. cs.CL/0209010, 2002.

E. F. T. K. Sang and F. D. Meulder, “Introduction to the conll-2003 shared task: Language-independent named entity recognition,” in CoNLL, 2003.

B. Sundheim, “Overview of results of the muc-6 evaluation,” in MUC, 1995.




DOI: https://doi.org/10.15625/1813-9663/34/4/13163

Journal of Computer Science and Cybernetics ISSN: 1813-9663

Published by Vietnam Academy of Science and Technology