Skip to main content
Пријава

Collected Item: “Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian”

Врста публикације

Рад у зборнику

Верзија документа

објављена

Језик

енглески

Аутор/и (Милан Марковић, Никола Николић)

Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić

Наслов рада (Наслов - поднаслов)

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Назив конференције (зборника), место и датум одржавања

Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France

Издавач (Београд : Просвета)

European Language Resources Association

Година издавања

2020

Сажетак рада на енглеском језику

The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment between Serbian morphological dictionaries, MULTEXT-East and Universal Part-of-Speech tagset. The trained models will be used to publish the new version of the Corpus of Contemporary Serbian as well as the Serbian literary corpus. The performance of developed taggers were compared and the impact of training set size was investigated, which resulted in around 98% PoS-tagging precision per token for both new models. The sr_basic annotated dataset will also be published.

Почетна страна рада

3954

Завршна страна рада

3962

Кључне речи на енглеском (одвојене знаком ", ")

Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary

Шира категорија рада према правилнику МПНТ

М30

Ужа категорија рада према правилнику МПНТ

М33

Ниво приступа

Отворени приступ

Лиценца

All rights reserved

Формат датотеке

.pdf
Click here to view the corresponding item.