Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Врста публикације

Рад у зборнику

Верзија документа




Аутор/и (Милан Марковић, Никола Николић)

Branislava Šandrih, undefined undefined, Cvetana Krstev, Ranka Stanković, undefined undefined, undefined undefined

Наслов рада (Наслов - поднаслов)

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Назив конференције (зборника), место и датум одржавања

Proceedings - Natural Language Processing in a Deep Learning World

Издавач (Београд : Просвета)

Incoma Ltd., Shoumen, Bulgaria

Година издавања


Сажетак рада на енглеском језику

In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian newspaper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annotation, which were further used to train two Named Entity Recognition (NER) systems: Stanford and spaCy. All obtained models, together with a rule- and lexiconbased system were evaluated on two sample texts: a part of the gold standard and an independent newspaper text of approximately the same size. The results show that rule- and lexicon-based system outperforms trained models in all four scenarios (measured by F1), while Stanford models have the highest recall. The produced models are incorporated into a Web platform NER&Beyond that provides various NE-related functions.

Почетна страна рада


Завршна страна рада


DOI број


Кључне речи на српском (одвојене знаком ", ")

NER, Named Entity Recognition Systems, Serbian, Personal Names

Кључне речи на енглеском (одвојене знаком ", ")

NER, Sistemi za prepoznavanje imenovanih entiteta, srpski, lična imena


Шира категорија рада према правилнику МПНТ


Ужа категорија рада према правилнику МПНТ


Ниво приступа

Отворени приступ


Creative Commons – Attribution-Share Alike 4.0 International

Формат датотеке

