Претрага
99 items
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... English and Serbian. The structure of bilingual parallel lists has in general the following form: TermI[,TermI]*; TermII[,TermII]* where TermI represents a word in one language, and TermII a corresponding word in another. A few examples from the en-sr parallel list are: grandmother;baka,baba soft dr ...
... PWN, either by merging it into an existing synset, or adding it as a new hyponym synset. Keywords: WordNet, bilingual resources, term alignment, parallel lists 104 Five teams submitted 13 systems, with all teams performing better than chance, but only one team sur- passing a simple baseline, thus ...
... on-line resources and textbooks and used for generating lists of translational equivalents. Finally, we compiled a list of aligned en-sr terms from the web site of the Serbian Institute for Standardization. 3. Production The final parallel list of translation equivalents compiled from all of the ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... boxes (WordNet and/or Dictionary of librarianship). The system responds with several editable lists of keywords depending on what is found in resources chosen for expansion. The user can edit these lists by deleting some keywords or adding new ones. For example, if the user submitted biblioteka ...
... translation. Thus, for example, the OPUS corpus offers freely available parallel corpora in many languages, as well as interfaces for querying the corpus data [Tiedemann, 2009]. Another example of a system that uses parallel corpora for information retrieval is given in [Gravano, 2006]. The ...
... search of document collections consisting of aligned parallel texts converted in TMX (Translation Memory eXchange) format. TMX is an open XML-based standard intended for easier exchange of translation memory data, that is, aligned parallel texts, between tools and translation vendors [TMX, 2005] ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
Развој геолошког терминолошког речника ГеолИССТерм
... Glossaries are lists of terms with definitions that can be monolingual, bilingual or multilin- 51INFOtheca, № 1, vol XII, August 2011 rANkA STANkOvIć eT Al. ‒ The DevelOPmeNT OF The geOlISSTerm TermINOlOgIcAl DIcTIONAry gual. In the case of a multilingual linking of terms, parallel multilingual ...
... or bilingual lists, namely, lists of paired terms are often made. These lists usually do not contain definitions of terms. Further arrangement of the resources on the semantic scale (from weakly to strongly repre- sented semantic descriptions) is aimed at the in- troduction of relationships among ...
... ideas about the development and role of terminology in the geologic information system revolved around a variety of views, from the opinion that lists of geology terms for each domain should be made or favouring the idea of simply taking the terms from Geologic Terminol- ogy and Nomenclature edited ...Ranka Stanković, Branislav Trivić, Olivera Kitanović, Branislav Blagojević, Velizar Nikolić. "Развој геолошког терминолошког речника ГеолИССТерм" in INFOteka: časopis za informatiku i bibliotekarstvo, Beograd : Zajednica biblioteka univerziteta u Srbiji (2011)
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... contains letters c, s, z or digraphs dj, dz, a list of zero8 or more can- didates obtained from the dictionary SMD_DR, or one candidate obtained from lists of trigrams or bigrams, or a dictionary of MWUs for a sequence of words. The result of the application of the procedure to a sample text is given in ...
... ‘bey (genitive)’) vs. bjega ‘to runaway (aorist)’. Both systems use the same procedure for detecting words that should be “corrected” and producing lists of candidates for replacement, although, obviously, the resources they use are different. The results produced by two systems on two sample texts are ...
... concrete problems (diacritic restoration, language variant switching). – Detecting words that are candidates for change as well as the production of lists of candidates for replacement is done by finite-state transducers implemented in Unitex software (Paumier et al., 2016). – All presented systems consist ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... applied the strategy to the same data that we used to produce it, that is our initial DELAC dictionary. In the second step, we applied it to several lists of MWUs that we have collected from various sources. After applying our strategy to the initial DELAC dictionary containing 2571 nouns and 207 adjectives ...
... 10% 0.00% Total 73.42% 14.72% 11.87% 100.00% 77.07% 20.00% 2.93% 100.00% In the second step we applied our strategy for nouns to several different lists of MWUs. We have not applied our strategy for adjectives in this step Automatic Construction of a Morphological Dictionary of MWUs 9 simply because ...
... the letter R (604). As the analysis in the first step indicated that MWU proper names produce in general worse results, we decided to separate these lists in two groups. After removing those already in DELAC we got a list of MWU toponyms (206) and a list of MWU common nouns (784). Table 5 shows that in ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC
OntoLex, dominantni standard zajednice za mašinski čitljive leksičke resurse u kontekstu RDF-a, Linked Data i tehnologija Semantičkog veba, trenutno se proširuje sa posebnim modulom za Frekvencije, Primere i Informacije zasnovane na Korpusu (OntoLex-FrAC). Predlažemo novi komponent za OntoLex-FrAC, koji se bavi inkorporacijom korpusnih upita za (a) povezivanje rečnika sa korpusnim mašinama, (b) omogućavanje RDF baziranih web servisa da dinamički razmenjuju korpusne upite i podatke odgovora, i (c) korišćenje konvencionalnih upitačkih jezika za formalizaciju unutrašnje strukture kolokacija, skica reči i ...standardizacija, digitalna leksikografija, OntoLex, upiti korpusa, povezani podaci, Lingvistički povezani otvoreni podaciChristian Chiarcos, Ranka Stanković, Maxim Ionov, Gilles Sérasset. "Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, 20-25 May 2024, LREC (2024)
-
A Tel Platform Blending Academic And Entrepreneurial Knowledge
... different forms. Indexes are the simplest form, basically just lists of terms, usually arranged in alphabetical order. Glossaries are slightly more complex and they can be monolingual, bilingual or multilingual. They are lists of terms with definitions, and in the case they are bilingual or ...
... order to offer support in expert terminology within the multilingual approach, the BAEKTEL platform provides electronic terminological resources, parallel (multilingual) corpora of lessons and texts in written form, and functionalities for searching and browsing of terminological resources and ...Ivan Obradović, Ranka Stanković, Jelena Prodanović, Olivera Kitanović. "A Tel Platform Blending Academic And Entrepreneurial Knowledge" in Proceedings of the The Fourth International Conference on e-Learning (eLearning-2013), September 2013, Belgrade, Serbia, Belgrade, Serbia : Belgrade Metropolitan University (2013)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... bilingual parallel corpus. Also, paper dictionaries, both monolingual and bilingual are digitized, parsed and stored in an auxiliary database as structured data in XML format. Figure 3. The pipeline for terminology compilation (termbase population). Compiled resources also comprise monolingual lists derived ...
... automatic data extraction, editing and publishing extracted data in (online) dictionaries. Using extracted lexicographically relevant data (lemma lists, example sentences, collocations) as complementary resources in electronic dictionaries is known as the one-click dictionary or push-pull dictionary ...
... digitization of paper dictionaries, enlargement of corpora, adding domain terms to general purpose morphological e-dictionaries and extraction of bilingual lists. The process of terminology compilation, from the perspective of monolingual and bilingual extraction, a well as the web and mobile form of the dictionary ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
WS4LR - a Worksation for Lexical Resources
... systems and their respective graph management tools. The same is true for other types of graphs produced and used within Intex/Unitex. 2.5 Bilingual Lists As a result of a various translation and lexicographic projects various unstructured bilingual wordlists from various domains were produced. An ...
... with two wordnets, the user can copy a synset form one wordnet to another thus synchronizing them automatically via the ILI. Unstructured bilingual lists may be used to suggest possible candidates for a synset. The module also performs various consistency checks on wordnets such as detecting dangling ...
... Multilingual Semantic Network for Balkan Languages, in Proc. of 1st International Wordnet Conference, Mysore, India Veronis, J. (ed.) (2000) Parallel Text processing: Alignment and Use of Translation Corpora, Dordrecht: Kluwer Academic Publishers Vossen, P. (ed.) (1998) EuroWordNet: A Mu ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... a bilingual dictio- nary with no parallel texts and the second one requiring only the existence of a small amount of parallel data. In order to compile a bilingual lexicon for a specific domain, we combined and compared several settings. Besides using only a parallel sentence-aligned corpus, we conducted ...
... presentation of bilingual correspondences between two languages (e.g. cor- respondences between Slovak-Bulgarian parallel corpus (Garab́ık and Dim- itrova, 2015)). Some approaches request parallel sentence-aligned data (Arcan et al., 2017; Garab́ık and Dimitrova, 2015; Bouamor et al., 2012; Semmar, 2018) ...
... Extraction From Parallel Corpora”. In Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015, Vol. 130, Accademia University Press, 2015 Garab́ık, Radovan and Ludmila Dimitrova. “Extraction and Presentation of Bilingual Correspondences from Slovak-Bulgarian Parallel Corpus”. ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
Towards a Mining Equipment Ontology
... bilingual or multilingual, and in the case of bilingual or multilingual glossaries, corresponding terms in different languages are often linked by lists of paired terms. On the next level of semantic scale of terminological resources, relationships 3 http://geoliss.ekoplan.gov.rs/term 2 ...
... for specific features. This can be achieved by creating appropriate domains. Domains are represented by terms, which can be in the form of simple lists of terms or hierarchical trees of terms. However, in both cases these terms can only be the ones that are present in the relevant terminological resource ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Towards a Mining Equipment Ontology" in Proceedings of the 12th International Conference Research and Development in Mechanical Industry, RaDMI 2012, September 2012, Vrnjačka Banja, Serbia no. 1, Vrnjačka Banja, Serbia : SaTCIP (Scientific and Technical Center for Intellectual Property) Ltd. (2012)
-
SrpELTeC: A Serbian Literary Corpus for Distant Reading
U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analyticsRanka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03
-
English for Geology Students. 2
Lidija Beko (2023)... including modern educational techno- logies. As previously emphasized, the aim of teaching vocabulary is its active application and all the word lists offered will be brought to life only by their activation in many different ways and through consistent work. I would like to use this opportunity ...Lidija Beko. English for Geology Students. 2, Belgrade : The Faculty of Mining and Geology, 2023
-
Environmental Energy Security Indicators as Tools for Environmental Protection
Еколошка прихватљивост је постала незаменљив синтезни показатељ сваке валидне енергетске анализе у последње две-три деценије. Енергетска безбедност је у директној хармонији с еколошком прихватљивошћу кроз многе прописане политичке циљеве, економске користи, правне тековине итд. Индикатори еколошке сигурности представљају један од основних елемената за одређивање енергетске безбедности и снажне алате за усмеравање енергетског сектора ка одрживом развоју. У овом раду, анализа је била концентрисана на показатеље енергетске безбедности у области животне средине који се односе на сектор природног гаса у ...енергетска безбедност, еколошки енергетски индикатори, заштита животне средине, еколошка прихватљивост... electricity production, etc.). For the analysis in this case study, two characteristic projections of natural gas consumption were identified. Table 5 lists the projections of absolute natural gas consumption [5, 23]. In terms of infrastructure development, according to [24] three characteristic scenarios ...
... sulfide is included in the natural gas composition. The amount of NOx emitted is directly related to projections of natural gas consumption. Table 6 lists the projections of natural gas consumption specifically per unit of energy produced. At the same time, the table also shows the amount of NOx emitted ...Aleksandar Madžarević, Dejan Ivezić, Marija Živković , Miloš Tansijević. "Environmental Energy Security Indicators as Tools for Environmental Protection" in Mining and Environmental Protection-MEP 2019, University of Belgrade, Faculty of Mining and Geology (2019)
-
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... Language) trans- formations: frameIndex, luIndex, fulltextIndex. In this section, we will show the use of the FrameNet wrapper. The function frames() lists all the frames contained in FrameNet. The following lines of code illustrate the initialization of working with FrameNet and return the information ...
... Concordances for adjective-noun pattern containing the noun ризик The results of a CQL19 (Corpus Query Language) query are analyzed for: frequency lists, collocations, concordances with a narrower and broader con- text. Figure 5 shows the concordances extracted from the Leximirka20 digital dictionary ...
... (sudden). The automatically generated thesaurus for the target word finds syn- onyms or words that fall in the same category (same semantic field) and lists them in a table with links to the sketches of individual words, concordances, word sketch differences and thesauruses. Figure 12 shows an illustration ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
-
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... of aligned parallel texts Parallel texts, which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph, sentence, etc) by matching the corresponding segments of the original and its translation. Aligned parallel texts are ...
... use aligned texts. If PWN is used for the source synset, then the language of one of the parallel texts must be English. Namely, WS4LR allows the user to search aligned texts using words from both parallel texts. All of the words found in both texts will be highlighted (in blue color) (Figure ...
... has to assign senses to all chosen words. It goes without saying that other linguistic resources, such as electronic dictionaries, bilingual word lists and corpora can be of invaluable help to the lexicographer in accomplishing this task. In this paper we present a multifunctional tool which ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)