Претрага
47 items
-
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...... Section 6. 2 Related Work The existence of large-scale lexical resources for Serbian, e-dictionaries in particular (Kr- stev, 2008), coupled with local grammars in the form of finite-state transducers (Vitas and Krstev, 2012), enabled the development of a comprehensive rule-based system for NER Srp- NER ...
... 2019-Abstract-Booklet.pdf. Cvetana Krstev, Ivan Obradović, Miloš Utvić, and Duško Vitas. 2014. A System for Named Entity Recognition Based on Local Grammars. Journal of Logic and Computation, 24(2):473–489. Cvetana Krstev and Ranka Stanković. 2020. Old or New, we Repair, Adjust and Alter (Te- xts) ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... bian morphological dictionaries and local grammars are successfully being used for An Approach to Efficient Processing of Multi-Word Units 19 recognition of names of persons and of various functions they might perform within the society [10]. Local grammars for recognition of functions can recognize ...
... structures but, naturally, not all of them. The use of MWUs can contribute to the increase of the recall without further complicating the local grammars. For example, the local grammar does not recognize the function of the person acting as specijalni izaslanik UN za pregovore o statusu Kosova Marti Ahtisari ...
... Kosovo Martti Ahtisaari’ because the addi- tion o statusu ‘on the status’ is not foreseen by the local grammar. When pregovori o statusu ‘negotiations on the status’ are added to the MWU dictionary, the local grammar covers the aforementioned structure as well. This example leads us to pos- sible ap ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
-
Building learning capacity by blending different sources of knowledge
... support system, whose structure is outlined in Figure 3, is based on electronic language resources, namely, lexical resources, textual resources and grammars. The simplest multilingual lexical resources in general are bilingual dictionaries in electronic form. However, for their full functionality in ...
... envisaged as OER languages within the BAEKTEL network. Besides morphological dictionaries, for full functionality of the language support system grammars are also needed, and they are implemented by the so called finite state automata, finite state transducers and compound inflection rules (Krstev ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Dalibor Vorkapić. "Building learning capacity by blending different sources of knowledge" in International Journal of Learning and Intellectual Capital (2016). https://doi.org/10.1504/IJLIC.2016.075698
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... the MULTEXT-East tagset (Erjavec, 2012) was also tailored to be universal. SMD uses its own tagset that corresponds closely to Serbian traditional grammars. The Serbian TreeTagger models TT11 and TT19 (see Subsec- tion 3.3.) use modifications of the SMD tagset. A gen- eral overview of the tagsets used ...
... Informatica, 28(4):431–436. Krstev, C., Obradović, I., Utvić, M., and Vitas, D. (2014). A system for named entity recognition based on lo- cal grammars. Journal of Logic and Computation, 24(2):473–489. Krstev, C. (2008). Processing of Serbian – Automata, Texts and Electronic Dictionaries. University ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... 204-212, University of Bergen, Department of Foreign Languages. Laporte, E. and Monceaux, A. (1999). "Elimination of lexical ambiguities by grammars. The ELAG system", Lingvisticae Investigationes XXII, Amsterdam-Philadelphie : Benjamins, pp. 341-367. Lee, K., Burnard, L., Romary, L., de la ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
-
An Approach to Development of Bilingual Lexical Resources
... academic purposes. This volume is published and copyrighted by its editors. Local Proceedings also appeared in ISBN 978-86-7031-200-5, Faculty of Sciences, University of Novi Sad. 102 language resources such as grammars in the form of finite automata and transducers, as well as various lexical ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... the information extrac- tion problem. A speedy development of IE and QA is expected, given the extent of developed morphological dictionaries and local grammars. ere are other fields in which linguistic technology is being applied. One of them is plagiarism detec- tion, which uses language-independent ...
... t of a dictionary of compounds was initiated. Aligned French-Serbian andEnglish-Serbian corpora of literary texts were devel- oped, as well as local grammars for certain segments of Serbian (especially for named entities). Different so- ware tools were also developed, among which special attention should ...
... that the level of development of technologies and resources is sat- isfactory, mainly due to the existence of large elec- tronic dictionaries and local grammars. An imme- diate consequence of this fact is that necessary tools for information retrieval and information extraction are available. Some of the ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
The Impact Assessment of NO2 Emission from District Heating Plant on Local Air Quality, the Case of Zemun, Belgrade
The Belgrade district heating system relies on fossil fuels in heat production, where natural gas has the largest share of around 95% and fuel oil with around 4.4%. The Zemun heating plant is completely fueled with fuel oil. Currently, it is the largest plant that is not fueled with natural gas. The objective of this paper is to assess the impact of NO2 from the Zemun heating plant on local air quality by analyzing and comparing the concentration of ...NO2, Zagađenje vazduha, Kvalitet vazduh na lokalnom nivou, Model disperzije, AERMOD, Daljinsko grejanje, BeogradBoban Pavlović, Uroš Pantelić, Marija Živković, Dejan Ivezić. "The Impact Assessment of NO2 Emission from District Heating Plant on Local Air Quality, the Case of Zemun, Belgrade" in Conference on Sustainable Development of Energy, Water and Environment Systems, Dubrovnik, 10.10.-15.10.2021., University of Zagreb, Zagreb, Croatia; Instituto Superior Técnico, Lisbon, Portugal (2021)
-
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... in the standard Serbian that were retrieved from its definition. For lemmatization task we used Serbian morphological electronic dictionaries and grammars developed within the University of Belgrade Human Language Technology Group [14]. Morphological electronic dictionaries of Serbian for NLP are being ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
-
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... from the following e-dictionary lines (lower part of the same graph): (8) mirovne,mirovan.A:aefs2g mirovne,mirovan.A:aefp1g 4In Unitex complex grammars can be modelled by using finite-state transducers and e-dictionaries (http://www-igm.univ- mlv.fr/ unitex/) Figure 2: Two paths from a graph ...
... window in which defini- tions of acronyms are looked for is usually narrow – defini- tions appear in local context – but authors in (Jacobs et al., 2014) report that they are looking for non-local expansions of acronyms (they need not appear in same documents as acronyms). The third task can be tackled ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
-
Старење бунара у алувијалним срединама различитог степена оксичности
Brankica Majkić (2013-09-27)Старење бунара настаје као последица процеса корозије и колмирања. Процесекорозије могуће је спречити уградњом филтерских конструкција од материјалаотпорних на корозију. Колмирање може да настане без обзира на врсту материјалаод кога се праве бунарске конструкције, а процесом може бити захваћенаприфилтарска зона па и зона саме водоносне средине. Из тог разлога тезa јеусмеренa на процесе који доводе до колмирања бунара и последицe опадањaкапацитета водозахватних објеката. Полазна хипотеза је да бунари стареразличитом брзином у срединама различитог степена оксичности. Хидрохемијскеи микробиолошке карактеристике подземних ...старење бунара, алувијалне издани, степен оксичности средине,колмирање, бунарски талог, локални хидраулички губитак, дозвољене улазнебрзине... detail to understand the processes leading to deposition. Applying the method for the determination of local hydraulic losses developed at the Belgrade groundwater source, the rate of increase in local hydraulic losses and the permissible entrance velocity were determined for each of the studied wells ...
... бактерије ICP – (Inductively coupled plasma) - Индукована куплована плазма са масеним детектором LHR- (Local hydraulic resistance) –Локални хидраулички губитак KLHR- (Kinetic of local hydraulic resistance) – Кинетика локалног хидрауличког губитка mnm –Метара над морем РХМЗ -Републички ...
... doctoral dissertation alluvial sediments, the incrustation analyses and the results of determining local hydraulic losses and permissible entrance velocities to the wells. Comprehensive multidisciplinary research of the selected test areas revealed ...Brankica Majkić. "Старење бунара у алувијалним срединама различитог степена оксичности" in Универзитет у Београду, Универзитет у Београду, Рударско-геолошки факултет (2013-09-27)
-
Keyword Extraction from Parallel Abstracts of Scientific Publications
... Group at the University of Belgrade [30], and (2) a Serbian lemmatizer. For lemmatization, we use Serbian morphological elec- tronic dictionaries and grammars developed within the University of Bel- grade Human Language Technology Group [17]. Morphological electronic dictionaries of Serbian for NLP have ...Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... Marocco (2008) 7. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Pro- cessing. MIT Press (2001) 8. Laporte, E.: Lexicons and Grammars for Language Processing: Industrial or Hand- crafted Products? In Rezende, L.M., da Silva, B.C.D., Barbosa, J.B., eds.: Léxico e gramática: dos ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... features of Serbian grammar, especially its rich morphology, need corresponding language resources in the form of morphological e-dictionaries and grammars, implemented by finite state automata, finite state transducers and multi-word unit inflection rules (Krstev, 2008). All of them are used to improve ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... 54th ACL. pages 21–27. Cvetana Krstev, Ivan Obradović, Miloš Utvić, and Duško Vitas. 2014. A System for Named En- tity Recognition Based on Local Grammars. Jour- nal of Logic and Computation 24(2):473–489. https://doi.org/10.1093/logcom/exs079. Cvetana Krstev, Miloš Utvić, and Jelena Jaćimović ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... Serbian language15, Serbian and English WordNets, terminological databases: Termi, GeolISSTerm, RudOnto and Librarian dictionary. Apart from the grammars in the form finite state automata and transducers, system is using rules for inflection of multiword units. Among textual resources are most important ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
Глаголи у кухињи и за столом
Цветана Крстев, Биљана Лазић (2015)У раду је приказано истраживање лексике на српском језику кулинарског домена које се заснива на коришћењу доменског корпуса, електронских лексичких ресурса, пре свега WordNet-а и морфолошких речника, и локалних граматика. Приказане су доменске специфичности ових ресурса, како се користе, и међусобно употпуњују. Посебно је приказано како се коришћењем доменског корпуса могу екстраховати глаголи специфични за кулинарски домен и описати начини њиховог коришћења. Дат је попис глагола са основним подацима који је добијен применом представљених метода.аутоматска обрада, коначни трансдуктори, електронски речници, семантичке мреже, локалне граматике, кулинарство... culinary domain in Serbian based on the use of the domain corpus, electronic lexical resources – WordNet and morphologcila dictionaries – and local grammars. We presented the domain characteristics of these resources, how they can be used for research and for mutal enrichment. In more details we showed ...Цветана Крстев, Биљана Лазић. "Глаголи у кухињи и за столом" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и преимене, Вол. 44/3, Београд : Међународни славистички центар (2015)
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... English part of the bilingual corpus is tagged by Treetagger [36,37]. Texts included in corpora are also processed using electronic dictionaries and local grammars. It is important to note that text processing and related mining vocabulary ex- pansion is an iterative process. Namely, among other tasks, corpora ...
... Serbia, 2008. 34. Krstev, C.; Stanković, R.; Obradović, I.; Lazić, B. Terminology Acquisition and Description Using Lexical Resources and Local Grammars. In Proceedings of the 11th International Conference on Terminology and Artificial Intelligence, Granada, Spain, 4–6 November 2015; Volume 1495 ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892