Претрага
339 items
-
Keyword Extraction from Parallel Abstracts of Scientific Publications
... Serbian language we use: (1) Stop-word list - prepared at the Human Language Technology Group at the University of Belgrade [30], and (2) a Serbian lemmatizer. For lemmatization, we use Serbian morphological elec- tronic dictionaries and grammars developed within the University of Bel- grade Human Language ...
... for the English language (see Table 2). This is in line with our previous findings for the Croatian language [8]. Both, Serbian and Croatian language are morpho- logically rich, and closely related languages from South Slavic language family. Unlike English, which is inflectional language and has a strict ...
... English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. This work shows that SBKE can be easily ported to new a language, domain and type of text in ...Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
-
The Nooj System as Module within an Integrated Language Processing Environment
... that contains NooJ as one of its main modules. This environment named WS4LR (WorkStation for Lexical Resources) has been developed within the Human Language Technology Group (HLT) at the Faculty of Mathematics, University of Belgrade, and is aimed at manipulating heterogeneous lexical resources ...
... inflectional graphs: ‘document, dokumenta, dokumentu, dokumentom,..’ 2. Integrated environment for linguistic research 2.1. Motivation The Human Language Technology group has been developing a variety of lexical resources over a long period, reaching a considerable volume to date. These resources ...
... System as Module within an Integrated Language Processing Environment Ranka Stanković, Duško Vitas, Cvetana Krstev Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] The Nooj System as Module within an Integrated Language Processing Environment | Ranka Stanković ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
Frequency and Length of Syllables in Serbian
Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová (2019)Basic analyses of several properties of syllables (the rank-frequency distribution, the distribution of length, and the relation between length and frequency) in Serbian is presented. The syllabification algorithm used combines the maximum onset principle and the sonority hierarchy. Results indicate that syllables behave similarly to words as far as mathematical models are concerned, but values of parameters in models for syllables are quite different from those for words.... Syllables in Serbian 117 3. Language material Serbian is a South Slavic language. It has the official status in Serbia (exclusively) and in Bosnia and Herzegovina (as one of three languages, together with Bosnian and Croatian), and the status of a minority language in several other countries. Given ...
... above, with a general syllable definition lacking, a scientist can apply language- specific rules for syllabification (e.g. using morpheme borders as one of the criteria for syllable borders). While the application of language-specific rules is not bad per se, if one wants to compare models, parameter ...
... approach to all languages under investigation is indispensable. If a language allows only open syllables (such as Old Slavonic, cf. Rottmann, 1999), the syllabification is straightforward (provided that diphthongs – if the language under investigation contains any – can be reliably distinguished from ...Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová. "Frequency and Length of Syllables in Serbian" in Glottometrics (2019)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... 3 http://www.illc.uva.nl/EuroWordNet/sample.html 4 http://nlp.fi.muni.cz/projekty/visdic/ 3. A Multifunctional Language Resource Tool 3.1 Motivation The Human Language Technology group at the University of Belgrade has been developing various lexical resources over quite a long period ...
... match in the target language, regardless of the fact whether these target language synsets have previously been retrieved from the wordnet by the user or not, and which PWN synsets do not have a match. The latter are obviously candidates for new synsets in the target language. Figure 9. ...
... dictionary and hierarchical thesaurus for a particular language, opens two critical issues. The first pertains to the organization of the conceptual network. Simply put, the issue is how to define the concepts for a particular language and how to establish links among them? In other words, ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution
This paper explores the effectiveness of parallel stylometric document embeddings in solving the authorship attribution task by testing a novel approach on literary texts in 7 different languages, totaling in 7051 unique 10,000-token chunks from 700 PoS and lemma annotated documents. We used these documents to produce four document embedding models using Stylo R package (word-based, lemma-based, PoS-trigrams-based, and PoS-mask-based) and one document embedding model using mBERT for each of the seven languages. We created further derivations of these ...Mihailo Škorić, Ranka Stanković, Milica Ikonić Nešić, Joanna Byszuk, Maciej Eder. "Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution" in Mathematics, MDPI AG (2022). https://doi.org/10.3390/math10050838
-
WS4LR - a Worksation for Lexical Resources
... ivano@afrodita.rcub.bg.ac.yu Abstract In this paper we describe WS4LR, the workstation for lexical resources, a software tool developed within the Human Language Technology Group at the Faculty of Mathematics, University of Belgrade. The tool is aimed at manipulating heterogeneous lexical resources, and ...
... and runs on a personal computer under Windows 2000/XP/2003 operating system with at least 256MB of internal memory. 1 Introduction The Human Language Technology group at the Faculty of Mathematics has been developing various lexical resources over quite a long period, reaching a considerable ...
... criteria in the source language are highlighted (Figure 5). Figure 4. The form for expansion of the search criteria The user can also use the translation equivalence option which is aimed at locating equivalences in target language for occurrences found in the source language. This is done on ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... Framework - LMF). LMF is designed for lexicons specially designed for Natural Language Pro- cessing and Machine-Readable Dictionaries. LMF specification is represented as a subset of UML (Unified Modeling Language) language that provides lin- guistic description. The LMF consists of mandatory Core package ...
... used for natural language processing - NLP. 3 TEI 4 LMF 5 Lemon 84 Infotheca Vol. 19, No. 2, December 2019 Scientific paper The LMF prescribes a standardized framework for recording linguistic in- formation in computer lexicons and is based on the Standard ISO 24613: 2008 (Language Resource Management ...
... in the Serbian Corpus of the Serbian Language SrbCorp (version of 122 million words by Duško Vitas and Miloš Utvić)6. Information about the Corpus is stored in the KorpusMeta table. The LexicalRelation table stores information 6 Corpus of the Serbian Language – SrbCorp 86 Infotheca Vol. 19, No. ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... they are not (e.g. simply provocative). 8 SOFTWARE SOLUTIONS MODEL The human language processing group (HLT group) at the University of Belgrade is engaged for many years now in a task of producing various language resources9, both corpora and lexicons. Given the fact that these resources have ...
... LINGUISTICS The linguistic study of forensic texts is a part of the field of Natural Language Processing, which includes text types classification and syntax and semantic analysis of texts written in a natural language. Various texts are subject of the study: Acts of Parliament (or other law-making ...
... Sixth Interantional Conference on Language To keep development and use of the applications and resources at the same time, without frequent conversions, the strategy for the development was to support original formats used in another software tools for language resources processing (Unitex, WorNet ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
An Approach to Development of Bilingual Lexical Resources
... keywords. The paper also outlines linguistic criteria used for building language resources for French, Italian, and German, and the use of multi-term descriptors as a means to better identify the content. The Human Language Technology group at the University of Belgrade developed Bibliša (http://hlt ...
... Multilingual textual repositories, such as digital libraries of e- journals represent a specific type of language resources. Efficient search of these resources usually relies on specific language tools, which often use other available resources, such as e-dictionaries, wordnets and the like. An ...
... University of Novi Sad. 102 language resources such as grammars in the form of finite automata and transducers, as well as various lexical resources. Bibliša is able to expand search queries both morphologically and semantically, as well as to another language. One type of lexical resources ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction
Velislava Stoykova, Ranka Stanković (2018)Velislava Stoykova, Ranka Stanković. "Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction" in Advances in Intelligent Systems and Computing, Springer International Publishing (2018). https://doi.org/10.1007/978-3-319-91189-2_16
-
Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... TEL platform consists of tools and resources: learning, language and implementation resources. Among the tools some are available open source and commercial tools, some are in-house tools developed by the University of Belgrade Human Language Technology Group. Learning resources are both academic: ...
... 4 The language support system The need for multilinguality of OER is a combined effect of globalization and European integration, favoring a holistic approach that takes into account all the languages a learner may use, as opposed to the more traditional approach looking at one language at a time ...
... 802 The language support system, whose structure is outlined in Figure 3, is based on electronic language resources, namely, lexical resources, textual resources and grammars. Bilingual dictionaries in electronic ...Ivan Obradović, Ranka Stanković. "Using technology for knowledge transfer between academia and enterprises" in Knowledge and Management Models for Sustainable Growth, Proc. of IFKAD 2014, 9th International Forum on Knowledge Asset Dynamics, 11-13 June 2013, Matera, Italy, Bari : IFKAD (2014)
-
GIS Application Improvement with Multilingual Lexical and Terminological Resources
... ac.rs Abstract This paper introduces the results of integration of lexical and terminological resources, most of them developed within the Human Language Technology (HLT) Group at the University of Belgrade, with the Geological information system of Serbia (GeolISS) developed at the Faculty of Mining ...
... The research described in this paper is based on an integration of lexical and terminological resources, most of them developed within the Human Language Technology (HLT) Group at the University of Belgrade, and the Geological information system of Serbia (GeolISS), developed at the Faculty ...
... tool, a workstation for language resources, named WS4LR, which greatly enhances the potential of manipulating each particular resource as well as several resources simultaneously (Krstev et al., 2008). This tool has already been successfully used for various language processing related tasks ...Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... words of a par- ticular language systematized and organized in a specific manner, are developed in various for- mats. Thus, for example, several different types of e-dictionaries, along with other lexical and textual resources, are being developed within the Human Language Technology Group, which ...
... BalkaNet languages are spoken, but also from France and Netherlands. A national development team was formed for each language, and in the case of Serbian this team was the Human Language Technology Group at the University of Belgrade. Upon the termination of this project, the development of SWN contin- ...
... with the acronym WS4LR (Workstation for Lexical Abstract: In this paper we describe how lexical resourc- es for Serbian, developed within the Human Language Technology Group, such as various types of electronic dictionaries and aligned texts, can be further refined and used for different purposes ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
Advantages of python programming language in hydrological model development
Milan Tucaković, Dragoljub Bajić, Vesna Ristić Vakanjac, Dušan Polomčić . "Advantages of python programming language in hydrological model development" in Proceedings of the XVIII Serbian Geological Congress, Divčibare, Serbia, 01-04 June 2022, Serbian Geological Society (2022)
-
Дигиталне библиотеке у рударству и геологији са посебним освртом на представљање сиве литературе
Имајући у виду потребу за проналажењем информација похрањених у различитим облицима документације која се генерише у областима рударства и геологије на Рударско-геолошком факултету Универзитета у Београду, отпочет је процес развоја дигиталне библиотеке ROmeka@RGF, на платформи за приказивање дигиталних колекција - Омека. Значајан део документације представља такозвана сива литература која је претежно заступљена у виду вишетомне документацијe. Први савладани изазов представљало је повезивање различитих вишетомних делова пројектних извештаја у једну целину која би била лако доступна и претражива.... Томашевић], 2018. Tomašević, Aleksandra, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja. „Managing mining project documentation using human language technology“. The Electronic Library Vol. 36 Issue: 6 (2018): 993-1009. Ћирковић, Сњежана. „Сива литература – камелеон информационих ресурса“ ...
... which are designed to define document relations. We will also present some language resources for Serbian language which are used to improve information retrieval. Keywords: digital libraries, grey literature, Omeka, language resources, dictionaries. ...
... to Improve the Performance of Web Search Engines”. Sixth International Conference on Language Resources and Evaluation (LREC ‘08), Marrakech, Morocco. Nicoletta Calzolari et al. (ur.). Marrakech : European Language Resources Association (ELRA), 2008. . Okoroma, Francisca. „Grey Literature Management ...Биљана Лазић, Александра Томашевић, Михаило Шкорић. "Дигиталне библиотеке у рударству и геологији са посебним освртом на представљање сиве литературе" in Научна конференција Библиоинфо — 55 година од покретања наставе библиотекарства на високошколском нивоу, Београд 18. мај 2017., Филолошки факултет Универзитета у Београду (2019). https://doi.org/10.18485/biblioinfo.2017.ch13
-
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
-
Development of integrated fuzzy model for mine management optimization
Miodrag Čelebić, Sanja Bajić, Dragoljub Bajić, Dejan Stevanović, Duško Torbica, Vladimir Malbašić (2023)... inaccuracies. As a result, subjective evaluation by engineers and expert experience have become increas- ingly important. Given that the natural language used by miners and geologists is most suited for relaying: knowledge and expressing; opinions, the paper tests a fuzzy optimization methodology ...
... physical and mechanical rock parameters, or environmental concerns. Likewise, the proposed methodology can be applied to consider other mining: technologies when selecting, the optimal alternative. Acknowledgements. Tbhe authors express their gratitude to the Ministry of Science, Technological ...
... 97, 89-117. | CHEN H. (2006) Applications of Fuzzy Logic in Data Mining Process. In: Bai Y., Zhuang H., Wang D. (eds), Advanced Fuzzy Logic Technologies in Industrial Ap- ications, Advances in Industrial Control, London, Springer, DOT: 10.1007,978-1- 84628-469-4 _17. BAJIĆ S., D. BAJIĆ, B. GLUŠČEVIĆ ...Miodrag Čelebić, Sanja Bajić, Dragoljub Bajić, Dejan Stevanović, Duško Torbica, Vladimir Malbašić. "Development of integrated fuzzy model for mine management optimization" in Comptes rendus de l'Académie Bulgare des Sciences (2023)
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... 1. Motivation In this paper we outline the main features of Bibliša (http://hlt.rgf.bg.ac.rs/Biblisha), a tool developed within the Human Language Technology group at the University of Belgrade, aimed at enhancement of search possibilities in multilingual digital libraries of e-journals ...
... text in the first TUV is usually in the source language, and the texts in the remaining TUVs are in one or more target languages. Although the order of languages is the same in each TU, there is a TUV attribute xml:lang that denotes the language of the text within the TUV. The performance of ...
... metadata All metadata, except language independent data, such as the numeration metadata (, , , , ), the and , are entered in both languages (Serbian and English), using the attribute xml:lang to denote the language of the content (see Figure 2) ... Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... database, language resources, dictionary, Serbian language 1 Introduction The first volume of the Dictionary of the Serbo-Croatian Standard and Vernacular Language (re- ferred to as the Dictionary of Serbian Academy or DSA), prepared and compiled by the Institute for the Serbian Language of the Serbian ...
... the Lexical Database Ranka Stanković1, Rada Stijović2, Duško Vitas1, Cvetana Krstev1, Olga Sabo2 1University of Belgrade, 2Institute for Serbian Language, Serbian Academy of Sciences and Arts E-mail: ranka.stankovic@rgf.bg.ac.rs, rada.stijovic@isj.sanu.ac.rs, vitas@matf.bg.ac.rs, cvetana@matf.bg ...
... olga011@yahoo.com Abstract In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Ver- nacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)