VLSP Shared Task: Named Entity Recognition
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/34/4/13161Keywords:
CoNLL format, evaluation, named entity, named entity recognition, shared task, Vietnamese, VLSP workshopAbstract
Named entities (NE) are phrases that contain the names of persons, organizations, locations, times and quantities, monetary values, percentages, etc. Named Entity Recognition (NER) is the task of recognizing named entities in documents. NER is an important subtask of Information Extraction, which has attracted researchers all over the world since 1990s. For Vietnamese language, although there exists some research projects and publications on NER task before 2016, no systematic comparison of the performance of NER systems has been done. In 2016, the organizing committee of the VLSP workshop decided to launch the first NER shared task, in order to get an objective evaluation of Vietnamese NER systems and to promote the development of high quality systems. As a result, the first dataset with morpho-syntactic and NE annotations has been released for benchmarking NER systems. At VLSP 2018, the NER shared task has been organized for the second time, providing a bigger dataset containing texts from various domains, but without morpho-syntactic annotation. These resources are available for research purpose via the VLSP website vlsp.org.vn/resources. In this paper, we describe the datasets as well as the evaluation results obtained from these two campaigns.Metrics
References
N. T. Dong, An investigation of vietnamese nested entity recognition models," in in The Fifth International Workshop on Vietnamese Language and Speech Processing (VLSP 2018), 2018. [Online]. Available: http://vlsp.org.vn/archives
P. L. Hong, Vietnamese named entity recognition using token regular expressions and bidirectional inference," in in The Fourth International Workshop on Vietnamese Language and Speech Processing (VLSP 2016), 2016. [Online]. Available: http://vlsp.org.vn/archives
T. H. Le, T. T. T. Nguyen, T. H. Do, and X. T. Nguyen, "Named entity recognition in vietnamese text," in in The Fourth International Workshop on Vietnamese Language and Speech Processing (VLSP 2016), 2016. [Online]. Available: http://vlsp.org.vn/archives
V.-T. Luong and L. K. Pham, Za-ner: Vietnamese named entity recognition at vlsp 2018 evaluation campaign," in in The Fifth International Workshop on Vietnamese Language and Speech Processing (VLSP 2018), 2018. [Online]. Available: http://vlsp.org.vn/archives
P. Q. N. Minh, A feature-based model for nested named-entity recognition at vlsp-2018 ner evaluation campaign," in in The Fifth International Workshop on Vietnamese Language and Speech Processing (VLSP 2018), 2018. [Online]. Available: http://vlsp.org.vn/archives
D. B. Nguyen, S. H. Hoang, S. B. Pham, and T. P. Nguyen, "Named entity recognition for vietnamese," in Intelligent Information and Database Systems, N. T. Nguyen, M. T. Le, and J. Swiatek, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 205-214.
H. Nguyen and T. Cao, "Named entity disambiguation: A hybrid approach," International Journal of Computational Intelligence Systems, vol. 5, no. 6, pp. 1052-1067, 2012.
T. C. V. Nguyen, T. S. Pham, T. H. Vuong, N. V. Nguyen, and M. V. Tran, "Dsktlab-ner: Nested named entity recognition in vietnamese text," in in The Fourth International Workshop on Vietnamese Language and Speech Processing (VLSP 2016), 2016. [Online]. Available:
T. T. V. Nguyen and H. T. Cao, "Vn-kim ie: Automatic extraction of vietnamese named-entities on the web," Journal of New Generation Computing, vol. 25, no. 3, pp. 277-292, 2007.
T. S. Nguyen, L. M. Nguyen, and X. C. Tran, Vietnamese named entity recognition @vlsp 2016 evaluation campaign," in in The Fourth International Workshop on Vietnamese Language and Speech Processing (VLSP 2016), 2016. [Online]. Available: http://vlsp.org.vn/archives
Q. H. Pham, M.-L. Nguyen, B. T. Nguyen, and N. V. Cuong, "Semi-supervised learning for vietnamese named entity recognition using online conditional random elds," in Proceedings of the Fifth Named Entity Workshop, joint with 53rd ACL and the 7th IJCNLP, Beijing, China,
July 2015, pp. 50-55.
T. Pham, L. M. Nguyen, and Q. Ha, "Named entity recognition for Vietnamese documents using semi-supervised learning method of crfs with generalized expectation criteria," in 2012 International Conference on Asian Language Processing, Nov 2012, pp. 85-88.
E. F. T. K. Sang and F. D. Meulder, "Introduction to the conll-2003 shared task: Language-independent named entity recognition," in Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003. [Online]. Available: http://www.aclweb.org/anthology/W03-0419
B. M. Sundheim, "Overview of results of the muc-6 evaluation," in Proceedings of the 6th Conference on Message Understanding, ser. MUC6 '95. Stroudsburg, PA, USA: Association for Computational Linguistics, 1995, pp. 13-31. [Online]. Available: https://doi.org/10.3115/1072399.1072402
P. T. X. Thao, T. Q. Tri, D. Dien, and N. Collier, "Named entity recognition in Vietnamese using classier voting," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 3:1-3:18, Dec. 2007. [Online]. Available: http://doi.acm.org/10.1145/1316457.1316460
E. F. Tjong Kim Sang, "Introduction to the conll-2002 shared task: Language-independent named entity recognition," in Proceedings of the 6th Conference on Natural Language Learning - Volume 20, ser. COLING-02. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002, pp. 155{158. [Online]. Available: https://doi.org/10.3115/1118853.1118877
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.