Претрага
7 items
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... were not present in the existing domain terminology. Keywords: aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection 1. Motivation Terminology is rapidly developing in many research and technological fields. It is very difficult to produce and main- ...
... use neither full parsing (as not available for Serbian) nor we lemmatize and POS-tag text. Instead we use shallow parsing relying on extensive morphological e-dictionaries of Serbian (Cvetana Krstev, Duško Vitas, 2015) that not only helps to identify terminology precisely, but also enables production ...
... terminological nominal phrases; • Bilingual Serbian/English list of inflected word forms and MWE pairs derived from bilingual dictionaries and morphological (inflected) dictionaries for Serbian and English; 4.1. Aligned/parallel corpus The English/Serbian textual resource was derived from the journal ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Combining Heterogeneous Lexical Resources
... ones are: • The system of morphological dictionaries of Serbian (SMD) in Intex format (Silberztein, 2000), that consists of a dictionary of simple lemmas, a dictionary of compounds (under construction), the corresponding dictionaries of word forms, and morphological finite-state automata that ...
... Mathematics, University of Belgrade is the development of various lexical resources. Among them the two most important ones are: the system of morphological dictionaries of Serbian (SMD) in Intex format and the Serbian wordnet (SWN) developed in the scope of the Balkanet project. Although these two ...
... retrieval task more efficient. In a number of cases this additional information disambiguates the otherwise homonymous literals. This additional morphological, syntactic, and semantic information can be transferred from the SMD of simple forms and associated to each synset literal in SWN. For instance ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... resources in any part of the system, wherever they are needed. Thus, for example, morphological dictionaries can be used for adding additional morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned ...
... 03:28:08 Production of morphological dictionaries of multi-word units using a multipurpose tool Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Production of morphological dictionaries of multi-word ...
... automated production of lemmas for e-dictionaries of multi-word units. Development of morphological dictionaries of MWUs is a tedious task, especially in the case of Serbian and other languages featuring complex morphological structures. After realizing that the development of such a dictionary manually is ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... can be used to link lexical entries. The ini- tial morphological dictionaries were Serbian Morphological Dictionaries. However, we will show multilingual application of Leximirka us- ing French Morphological Dictionaries. KEYWORDS: morphological dictionaries, language resources, Leximirka. PAPER ...
... working on the development of Serbian morphological dictionaries more than 25 years ago (Vitas, 1993; Krstev, 1997; Vitas et al., 1993). Morphological dictionaries represent a significant linguistic resource for languages with rich flexion. Therefore, Serbian morphological dictionaries represent a significant ...
... also exist for many other languages: German, Bulgarian, Pol- ish, Greek, Russian etc. The system of morphological dictionaries is based on the theory of finite-state automata, namely on morphological and local grammars in the form of finite-state transducers that generate all morpho- logical forms of ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... model for Serbian are: (a) Serbian morphological dic- tionaries (Cvetana Krstev, Duško Vitas, 2015) (SMD); (b) pre-annotated texts (Duško Vitas, Cvetana Krstev, Ranka Stanković, Miloš Utvić, 2019). 2.1. Serbian morphological dictionaries Serbian morphological dictionaries represent a rich lexical ...
... account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment between Serbian morphological dictionaries, MULTEXT-East and Universal Part-of-Speech tagset. The trained models will be used to publish the new version of the Corpus of Co ...
... new models. The SR BASIC annotated dataset will also be published. Keywords: Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary 1. Introduction The task of assigning to each token its Part-of-Speech cat- egory (noun, verb, adjective, etc.) is a common Natural ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... word, without changing the form of words. Extended search includes morphological and semantic search. Morphological search includes search of all inflected forms of specified word that retrieve from SrpMD (Serbian morphological dictionary). For nouns, grammatical forms include case and number for ...
... metadata search are customized and improved by query expansion via web service relaying on the Serbian morphological dictionary and the Serbian WordNet semantic network for providing morphological and semantic text search expansion. The paper outlines possibilities for further use and analysis on a ...
... The RESTfull Web services based on Unitex routines are used for the implementation of morphological analysis and output generation relaying on electronic dictionaries. For query expansion are combined morphological and semantic vocabularies, because synonymous terms are taken from WordNet11 and t ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)