Претрага
47 items
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
A System for Named Entity Recognition Based on Local Grammars
Krstev Cvetana, Obradović Ivan, Utvić Miloš, Vitas Duško. "A System for Named Entity Recognition Based on Local Grammars" in Journal of Logic and Computation 24 no. 2, :Oxford University Press (2014): 473-489. https://doi.org/10.1093/logcom/exs079
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... Using Lexical Resources and Local Grammars Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Terminology Acquisition and Description Using Lexical Resources and Local Grammars | Cvetana Krstev, Ranka ...
... employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Terminology acquisition and description using lexical resources and local grammars Cvetana Krstev Ranka Stanković Ivan Obradović Biljana Lazić University of University of University of University of Belgrade Belgrade Belgrade ...
... cally complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical re- sources and local grammars developed for Serbian. Special attention is given to auto- matic inflectional class prediction for simple adjectives and nouns and the use of ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... consists of 61,213 definitions of nouns. During the development of local grammars, we reduced this set by excluding definitions containing unknown words (not recorded in the e-dictionaries used by local grammars) since local grammars even if appropriate for them could not be successful. We thus obtained ...
... When developing these models by means of local grammars with FSTs we tend to capture all types of definitions, namely descriptive, reference-based and synonym-based. The coverage of noun definitions in the SASA dictionary by the developed local grammars is represented in Table 1. We have mentioned ...
... lexical features. The research corpus consists of 61,213 definitions of nouns, which were analysed using Serbian morphological e-dictionaries and local grammars implemented as finite state transducers in an open-source corpus processing suite Unitex. The 21 models developed up to the present moment cover ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Development of Open Educational Resources (OER) for Natural Language Processing
In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...... lemmas already in e- dictionaries). 8. The use of contexts in graphs that shift grammars modelled by regular graphs from context-free grammars towards context-sensitive grammas. These enable construction of grammars for shallow parsing. The illustration of context use is presented by recognition ...
... resourcesprogramme.[11] Unitex is based on finite-state technology. It enables application of morphological electronic dictionaries and grammars to texts for a number of different languages: 10http://baektel.eu/?menu=partners 11http://www-igm.univ-mlv.fr/~unitex/ French, English, Greek ...
... under the direction of its director, Maurice Gross. [12] With Unitex, user can develop electronic resources such as electronic dictionaries and grammars and apply them. Text analyses can be performed at the levels of strings, morphology, and syntax. Some of the functions are: developing and ...Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
Managing mining project documentation using human language technology
Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentationAleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... crucial; for others, like temporal expressions, local grammars in the form of FSTs that try to capture a 4 http://resursi.mmiljana.com/Default.aspx variety of syntactic forms in which a NE can occur had to be developed. However, for all of them local grammars were developed that use wider context to disam- ...
... Serbian [6]. 4.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [1], [2]. The role of electronic dictionar- ies, covering both simple words ...
... disam- biguate ambiguous occurrences as much as possible [7]. These local grammars were organized in cascades that further resolve ambiguities [10]. NER system was evaluated on a newspaper corpus and results reported in [7] showed that F -measure of recognition was 0.96 for types and 0.92 fot tokens. ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... represents a textual resource that LSS makes use of. Specific features of Serbian grammar need corresponding language resources in the form of grammars. Grammars within LSS are implemented by the so called finite state automata, finite state transducers and compound inflection rules (Krstev, 2008). ...
... resources supporting the multilinguality of the platform, terminology and its search and browse functions are lexical and textual resources and grammars. Implementation resources consists of best practice design principles and licensing tools to promote OER. Originality/value – Designing usable ...
... case studies, best practice examples, expert presentations and software demonstrations; • Language resources – lexical and textual resources and grammars to support the multilinguality of the platform, terminology and its search and browse functions; • Implementation resources - best practice design ...Ivan Obradović, Ranka Stanković. "Using technology for knowledge transfer between academia and enterprises" in Knowledge and Management Models for Sustainable Growth, Proc. of IFKAD 2014, 9th International Forum on Knowledge Asset Dynamics, 11-13 June 2013, Matera, Italy, Bari : IFKAD (2014)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... t, 2003). Although the statistical approach has been steadily pursued by a number of researchers, development of lexical resources and local grammars has given impetus to an alternative approach, namely multi-word extraction based on linguistic rules. Recently, a rule-based approach for ...
... Unitex E-dictionaries Rule based term extraction and lemmatization Corpora Domain specific 1. Tokenization and lexical analysis Local grammars 5. DELAC dictionary production 2. Rule based MWU term extraction and normalization 3. Rule based lemma correction and filtering ...
... 5. Concluding remarks and Future Work The paper presents an approach to terminology extraction for Serbian based on e-dictionaries and local grammars. For extraction purposes 14 graphs were developed, which extract the most frequent syntactic structures identified by an analysis of several ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... indexing, etc.), ∑ annotation of selected resources, ∑ OER repository on local edX platform. The BAEKTEL language support system consists of several software components administrating in the same time language resources: grammars, lexical and textual resources (Image 1). 4. LEXICAL RESOURCES Mor ...
... pp. 473–479. [15] C. Krstev, R. Stanković, I. Obradović and B. Lazić, “Terminology Acquisition and Description Using Lexical Resources and Local Grammars,” in Proc. 11th Conference on Terminology and Artificial Intelligence, 2015, pp. 81– 89. ...
... recognition, extraction and lemmatization. Picture 1 illustrates steps in terminology extraction. Crucial resources are morphological dictionaries and grammars. They are combined with some statistical measures for term extraction. The first step is analysis of terms in existing term base mentioned before ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
The Nooj System as Module within an Integrated Language Processing Environment
... Standard edition of NooJ. Noojapply.exe allows users to apply dictionaries and grammars automatically to texts or corpora. This paper presents two search examples using NooJ regular expressions and the NooJ syntactic grammars. Different options in application of noojapply.exe are presented on the left ...
... User can than use the generated graphs in Intex environment to perform the actual retrieval, or import them in NooJ and convert to syntactic grammars in order to perform the same task. Figure 5. The edit view, the hypernym/hyponym and graph view of a synset 3.3. The exchange of information ...
... application of different lexical resources. After selecting the type of noojapply.exe usage the user can choose the dictionaries and morphological grammars that he wishes to apply from a list of available lexical resources. Next, one or more text files or corpus should be selected from a list and ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
A Mathematical Learning Environment Based on Serbian Language Resources
In recent years, in line with ever growing usage of Information technology, the learning environments are changing. The amount of available learning materials in various forms has increased. These new environments demand comprehensive learning systems, which enable management of the learning corpus with special attention paid to relevant lexical resources. In this paper we present the concept of a Mathematical Learning Environment in Serbian (MLES), which is based on a corpus of mathematical materials and various lexical resources, enabling ...... expressions (MWE). Thus a procedure described in [12] has been used for semi-automatic extraction of MWEs on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use ...
... is given. The salient feature of the system is strong lexical support. Within MLES various types of lexical resources are used as well as local grammars, with the aim to provide a comprehensive and searchable learning environment. Although the general lexica in Serbian is well covered, ma ...
... Belgrade [12] Krstev, C., Stanković, R, Obradović, I., Lazić, B., (2015). Terminology Acquisition and Description Using Lexical Resources and Local Grammars. Proceedings of the 11th Conference on Terminology and Artificial Intelligence, pp. 81-89. [13] Mladenović, M., Mitrović, M., Krstev, C ...Radojičić Marija, Obradović Ivan, Stanković Ranka, Utvić Miloć, Kaplar Sebastijan. "A Mathematical Learning Environment Based on Serbian Language Resources" in Proceedings of the 7th International Scientific Conference Technics and Informatics in Education, Faculty of Technical Sciences, Čačak (2018)
-
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... Main, pp. 3–17. Крстев и др., 2013: Cvetana Krstev, Ivan Obradović, Miloš Utvić, DuškoVitas, “A system for namedentity recognition based on local grammars”, In: J Logic Computation 24 (2), pp. 473–489. Крстев/Лазић, 2015: Цветана Крстев, Биљана Лазић, „Глаголи у кухињи и за столом”, Научни састанак ...
... 2015: Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić “Terminology Acquisition and Description Using Lexical Resources and Local Grammars”, In: Proc. of the 11th Conferenceon Terminology and Artificial Intelligence, Granada, Spain, eds. Thierry Poibeau and Pamela Fab- er, LexiCon ...Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
-
A Tel Platform Blending Academic And Entrepreneurial Knowledge
... The BAEKTEL language support system consists of several software components handling simultaneously several types of language resources: grammars, lexical and textual resources (Fig 2). One of the basic lexical resources is the system of morphological dictionaries of Serbian simple words ...
... XML-compliant. It should finally be mentioned that due to the complex Serbian grammar the language support system also features grammars implemented through finite state automata, finite state transducers and compound inflection rules. The language resources in the BAEKTEL ...Ivan Obradović, Ranka Stanković, Jelena Prodanović, Olivera Kitanović. "A Tel Platform Blending Academic And Entrepreneurial Knowledge" in Proceedings of the The Fourth International Conference on e-Learning (eLearning-2013), September 2013, Belgrade, Serbia, Belgrade, Serbia : Belgrade Metropolitan University (2013)
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... expressions, local grammars in the form of FSTs that try to capture a variety of syntactic forms in which a NE can occur had to be developed. However, for all of them local grammars were developed that use wider context to disambiguate ambiguous occurrences as much as possible [13]. These local grammars were ...
... Serbian [12]. 3.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [3,7]. The role of electronic dictionar- ies, covering both simple words ...
... P r o o f 24 R. Stanković et al. 13. Krstev, C., Obradović, I., Utvić, M., Vitas, D.: A system for named entity recog- nition based on local grammars. J. Logic Comput. 24(2), 473–489 (2014) 14. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... phase in information extraction. Serbian morphological dictionaries and local grammars are successfully being used for recognition of names of persons and of various functions they might perform within the society. Local grammars for recognition of functions can recognize various syntactic structures ...
... but, naturally, not all of them. The use of MWUs can contribute to the increase of the recall without further complicating the local grammars. For example, the local grammar does not recognize the function of the person acting as specijalni izaslanik UN za pregovore o statusu Kosova Marti Ahtisari ...
... Kosovo Martti Ahtisaari’ because the addition o statusu ‘on the status’ is not foreseen by the local grammar. When pregovori o statusu ‘negotiations on the status’ are added to the MWU dictionary, the local grammar covers the aforementioned structure as well. This example leads us to possible applications ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... pertaining to the representation of specific phenomena. They are often a consequence of established practices of representation of these phenomena in grammars. For example, the number category does not exist in English and French, which use the determiner category instead, whereas other languages do ...
... results of the analysis point to the need of making a stronger connection between metadata from *.def files and resources – NooJ dictionaries and grammars. We suggest the replacement of the textual form of metadata by a more formal form in XML format, with a defined schema which would enable a stricter ...
... establishing a relation between them and NooJ existed. Along the lines of the observation that it would be particularly important if the concept of local grammar could be generalized in such a way that one grammar extracts (approximately) the same concepts from aligned texts in different languages ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
-
Creation of a Training Dataset for Question-Answering Models in Serbian
Razvoj i primena veštačke inteligencije u jezičkim tehnologijama značajno su napredovali poslednjih godina, posebno u domenu zadatka odgovaranja na pitanja (Question Answering - QA). Dok su postojeći resursi za QA zadatke razvijeni za glavne svetske jezike, srpski jezik je relativno zanemaren u ovoj oblasti. Ovaj rad predstavlja inicijativu za kreiranje obimnog i raznovrsnog skupa podataka za obučavanje modela za odgovaranje na pitanja na srpskom jeziku, koji će doprineti unapređenju jezičkih tehnologija za srpski jezik. Pored brojnih istraživanja o jezičkim modelima ...veštačka inteligencija, obrada prirodnog jezika, jezički resursi, anotirani skupovi, ekstrakcija informacija, odgovaranje na pitanjaRanka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov. "Creation of a Training Dataset for Question-Answering Models in Serbian" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
-
Increasing the Local Road Network Resilience from Natural Hazards in Municipalities in Serbia
Biljana Abolmasov, Miloš Marjanović, Ranka Stanković, Uroš Đurić, Nikola Vulović. "Increasing the Local Road Network Resilience from Natural Hazards in Municipalities in Serbia" in Progress in Landslide Research and Technology, Volume 3, Issue 1, Springer Cham. (2024). https://doi.org/https://doi.org/10.1007/978-3-031-55120-8_22
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... ish, Greek, Russian etc. The system of morphological dictionaries is based on the theory of finite-state automata, namely on morphological and local grammars in the form of finite-state transducers that generate all morpho- logical forms of words in the dictionary (Krstev, 2008). 2 Laboratoire d’Automatique ...
... 2018. https://www.researchgate.net/publication/265297624_A_ SKOS-based_Schema_for_TEI-encoded_Dictionaries Gross, Maurice. “The construction of local grammars”. In Finite State Lan- guage Processing eds. Emmanuel Roche and Yves Schabs (1997): 329–354, accessed September 1, 2015. https://halshs.archi ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4