A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/34/4/13163Keywords:
Nested named-entity recognition, CRF, VLSPAbstract
In this report, we describe our participant named-entity recognition system at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling problem using BIO encoding scheme. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features. We compare several methods to deal with nested entities in the dataset. We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition.Metrics
References
P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, “Class-based n-gram models of natural language,” Comput. Linguist., vol. 18, no. 4, pp. 467–479, Dec. 1992. [Online]. Available: http://dl.acm.org/citation.cfm?id=176313.176316
N. T. M. Huyen and V. X. Luong, “Vlsp 2016 shared task: Named entity recognition,” in Proceedings of Vietnamese Speech and Language Processing (VLSP), 2016.
J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in ICML, 2001, pp. 282–289.
P. Le-Hong, Q. N. M. Pham, T.-H. Pham, T.-A. Tran, and D.-M. Nguyen, “An empirical study of discriminative sequence labeling models for vietnamese text processing,” in Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017), 2017.
P. Liang, “Semi-supervised learning for natural language,” Ph.D. dissertation, Massachusetts Institute of Technology, 2005.
X. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2016, pp. 1064–1074.
P. Q. N. Minh, “A feature-rich vietnamese named-entity recognition model,” arXiv preprint arXiv:1803.04375, 2018.
D. Q. Nguyen, D. Q. Nguyen, T. Vu, M. Dras, andM. Johnson, “A Fast and Accurate Vietnamese Word Segmenter,” in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), 2018.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. [Online]. Available: http://www.aclweb.org/anthology/D14-1162
E. F. T. K. Sang, “Introduction to the conll-2002 shared task: Language-independent named entity recognition,” CoRR, vol. cs.CL/0209010, 2002.
E. F. T. K. Sang and F. D. Meulder, “Introduction to the conll-2003 shared task: Language-independent named entity recognition,” in CoNLL, 2003.
B. Sundheim, “Overview of results of the muc-6 evaluation,” in MUC, 1995.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.