Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

83 items

Terminology Acquisition and Description Using Lexical Resources and Local Grammars

Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić (2015)

Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...

... transducers using CasSys tool incorporated in Unitex1 corpus processing platform, as well as the use of TMF standard for the representation of terms is proposed in (Ammar et al., 2015) and applied on Arabic scientific and technical corpus. In (Savary et al., 2012) terminology extraction in the ...
... ported that modern statistical Natural Language Processing (NLP) is in great need of better lan- guage models and linguistic tools must come to 1 Corpus processing System Unitex: http://www-igm.univ- mlv.fr/~unitex/ Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada ...
... extraction In order to evaluate our approach, we applied it to a collection of 74 papers in Serbian from the journal Infotheca. 6 The size of the corpus is 6 Infotheca - Journal for Digital Humanities (http://infoteka.bg.ac.rs/index.php/en/infoteca) Proceedings of the conference Terminology and ...
Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
Multi-word Expressions for Abusive Speech Detection in Serbian

Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev (2020)

Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...

uvredljiv govor, govor mržnje, leksički izvori, višejezični leksikon, izrazi sa više reči

... the domain corpus of hateful content and Subjectivity lexicon of Therese Wilson in combination with the SentiWordNet (Esuli and Sebastiani, 2006).For clas- sification, they leveraged rules and achieved a result of F1 = 0.783 for strongly hateful sentences on a manually annotated domain corpus. Razavi ...
... Processing Paradigm for Balkan Languages, pages 15–22. Cvetana Krstev, Jelena Jaćimović, and Duško Vitas. 2020. Analysis of similes in serbian literary texts (1840- 1920) using computational methods. In Svetla Koeva, editor, Proceedings of the Fourth International Confer- ence Computational Linguistics ...
... hyperbole, litotes etc. Initial work on detecting some of these figures has been presented in (Mladenović et al., 2017; Krstev et al., 2020). Using a corpus of newspaper articles from 2006, Krstev et al. (2007) presented the results of an infor- mation search experiment in search of attacks which are the ...
Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School

Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković (2021)

Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...

nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku

... September 2021 115 Dojchinovski M. et al., eurolan 2021: . . . Linked Data. . . , pp. 113–120 Ponsoda 2017), FrAC 12 – frequency, attestation and corpus Informa- tion (Chiarcos et al. 2020). Finally, the training school ended with a closing session where an ontology of participants, lecturers and ...
... and building on to present more specific topics in a detailed fashion on the last day, the participants had 12. FrAC – Frequency, Attestation and Corpus Information - Ontology-Lexica Community Group 116 Infotheca Vol. 21, No. 1, September 2021 Professional paper a chance to acquire a solid foundation ...
... Lex Frac module was used for representation of the entries from the lexicon used for abusive speech detec- tion with attestations from the Twitter corpus with annotation of abusive spans (Jokić et al. 2021). 3 Organization Due to the COVID-19 pandemic and current travel restrictions in Europe and beyond ...
Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
From DELA Based Dictionary to Leximirka Lexical Database

Biljana Lazić, Mihailo Škorić (2020)

In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...

Morfološki rečnici, jezički resursi, Leksimirka

... 000 most frequent words in the Serbian Corpus of the Serbian Language SrbCorp (version of 122 million words by Duško Vitas and Miloš Utvić)6. Information about the Corpus is stored in the KorpusMeta table. The LexicalRelation table stores information 6 Corpus of the Serbian Language – SrbCorp 86 ...
... that match the specified search criteria appear as rows in the table. The registered user has access to multiple corpus searches (in the MatKorp and SrpKorpRGF corpora). The Mining Corpus (RudKorp) (Tomašević et al., 2018) that can be searched by some predefined queries that retrieve a word searched ...
... their main importance is their reusability. They were used for the basic tasks of word processing, automatic recognition 1 Unitex is cross-platform Corpus Processing Suite to retrieve data. Infotheca Vol. 19, No. 2, December 2019 81 Lazić B., Škorić M., “From DELA based dictionary to . . . ”, pp ...
Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Infotheca (Q25460443) in Wikidata

Ranka Stanković, Lazar Davidović (2021)

Vikipodaci su baza znanja Zadužbine Vikimedija koja predstavlja zajednički izvor različitih vrsta podataka koje koriste ne samo drugi Vikipedijini projekti, već sve više i brojne aplikacije semantičkog veba. U ovom radu ćemo prezentovati primer integracije Vikipodataka sa digitalnim bibliotekama i eksternim sistemima, kao i mogućnost ubrzanja pripreme i unosa podataka na primeru radova iz časopisa za digitalnu humanistiku Infoteka.

semantički veb,otvoreni povezani podaci, vikpodaci,Infoteka, metapodaci časopisa

... open data network was used by Andonovski (Андоновски 2020) to describe lan- guage resources, namely, novels forming part of the Serbian-German literary corpus (Andonovski, Šandrih, and Kitanović 2019). For a number of years now, students at the Faculty of Mining and Geology have been undergoing training ...
... of open data. As part of the “Distant Reading for European Literary History”12 се ради на уносу метаподатака о српским романима из корпуса srpELTeC 13 COST Action CA16204 (2017-2021) metadata about Serbian novels included in the srpELTEC corpus is being entered into the knowledge base (Krstev et al. 2019) ...
... 10. Wikimedia 11. Input data to Wikidata and their use 12. One of the most important aims of this action is preparing a multilingual corpus (titled European Literary Text Collection - ELTeC) which, when fully com- plete, will feature a hundred novels from each participating country first published in ...
Ranka Stanković, Lazar Davidović. "Infotheca (Q25460443) in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.5
Resource-based WordNet Augmentation and Enrichment

Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev (2018)

In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset deﬁnitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...

WordNet, bilingual resources, term alignment, parallel lists

... wordnets. The English part of each corpus was semantically tagged, after which the process of wordnet creation was transformed into a word alignment problem, where wordnet synsets in the English part of the corpus were aligned with in the target language part of the corpus. The obtained precision was s ...
... with domain-specific single and multi- word expressions. They used a large monolingual Slovene corpus of texts to extract terminology from the domain of informatics, and a parallel English-Slovene corpus and an online dictionary as bilingual resources to facilitate the addition of new terms to sloWNet ...
... parallel resources, and search for new pairs of aligned literals for synsets, which will then be manually post-edited. We also plan to use parallel corpus based methodologies relying on two strategies proposed in ((Oliver et al., 2015)) for automatic construction of the required corpora: by machine ...
Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
Transformer-Based Composite Language Models for Text Evaluation and Classification

Mihailo Škorić, Miloš Utvić, Ranka Stanković (2023)

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...

General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)

Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević (2024)

Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...

zero-shot, few-shot, sentiment, Serbian, Mistral model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Sofia, Bulgaria, 9-10 September 2024, LREC | COLING (2024)
The Nooj System as Module within an Integrated Language Processing Environment

Ranka Stanković, Duško Vitas, Cvetana Krstev (2008)

NooJ, electronic dictionary, lexical resources

... lex-resources to texts”) then syntactic resources should not be chosen, and if the last option is on (“Apply query to corpus”), then the user selects only a query and a corpus. Figure 12 presents results in the form of concordances for the query: kompjuter, which was automatically expanded with ...
... retrieval and related areas. If query is further combined with ILI, a multilingual wordnet pivot, the possibility of searching text resources (web, corpus, text) in different languages with a single query is opened. NooJ supports morphological query expansion and expansion of queries by graphs and ...
... in information retrieval and related areas. Combined with the wordnet ILI, this approach opens the possibility of searching text resources (web, corpus, text) in different languages with a single query. Powerful linguistic tools such as NooJ, though inherently multilingual since resources for ...
Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
Frequency and Length of Syllables in Serbian

Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová (2019)

Basic analyses of several properties of syllables (the rank-frequency distribution, the distribution of length, and the relation between length and frequency) in Serbian is presented. The syllabification algorithm used combines the maximum onset principle and the sonority hierarchy. Results indicate that syllables behave similarly to words as far as mathematical models are concerned, but values of parameters in models for syllables are quite different from those for words.

frekvencije slogova, dužina slogova, srpski jezik

... onsets and codas. If one follows his modification, a large enough corpus is needed to perform statistical tests, based on which a decision on the (non-) marginality of a particular consonant cluster is made. Finding or creating such a corpus can be problematic for minor languages (such as e.g. Lower and ...
... socialist realist novel “Kak zakalyalas’ stal’” (How the Steel Was Tempered) by N. Ostrovsky. The choice is motivated by the fact that a parallel corpus consisting of the first ten chapters of the novel and their translations to all standard Slavic languages (except for Lower Sorbian) is available ...
... for Croatian), or using the approach suggested by Pulgram (1970) and modified by Lehfeldt (1971), with its drawback of needing a sufficiently large corpus (Kelih & Mačutek, 2013, for Russian and Slovene), or not at all (because the mean syllable length in words was sufficient for the purposes of the ...
Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová. "Frequency and Length of Syllables in Serbian" in Glottometrics (2019)
Part of Speech Tagging for Serbian language using Natural Language Toolkit

Ranka Stanković, Boro Milovanović (2020)

Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...

obrada prirodnog jezika, mašinsko učenje, neuronske mreže

... language data Repository Area) is a project that produced multilingual corpus on law, health and education [10]. Around the world in 80 days is a novel by Jules Verne annotated during SEE-ERA.net project [11]. ELTeC (European Literary Text Collection) is a multilingual collection of the novels written ...
... HLT Group and Jerteh, Lexical resource, 2.0, 2015 [15] A. Balvet, D. Stošić, and A. Miletić, (2014). TALC-Sef a Manually- revised POS-Tagged Literary Corpus in Serbian, English and French. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp. 4105-4110 ...
... on South Slavic and Balkan Languages”. Scientific results of the SEE-ERA.NET pilot joint call, pp 5, Oct. 2009 [12] Distant Reading for European Literary History, a COST Action funded by the Horizon 2020 Framework. https://www.distant-reading.net/, Mar. 2020 [13] M. d. Marneffe, T. Dozat, N. Silveira ...
Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
E-Connecting Balkan Languages

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva (2009)

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.

Query expansion, e-dictionary, wordnet, proper name, aligned text

... 2005-02 de l’Institut Gaspard- Monge, CNRS, 2005. [4] T. Erjavec and N. Ide. The MULTEXT-East Corpus. In LREC’98, Granada, pp. 971-974, 1998. [5] A. Gelbukh, G. Sidorov, J.-A. Vera-Félix. A Bilingual Corpus of Novels Aligned at Paragraph Level. In proc. FinTAL-2006. Lecture Notes in Artificial ...
... compiled, from large corpora usually fully automatically prepared comprising from texts in some limited technical domain [18], to more versatile literary corpora [5] that are often more modest in size but minutely prepared. The main textual resource used to explore WS4LR is Jules Verne’s novel ...
... relation The results obtained by this query are very interesting and show by themselves the potential this tool offers for various linguistic and literary researches. This query retrieved 129 aligned segments, each of which contained at least one of the keywords from the produced query set in at ...
Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
Indexing of textual databases based on lexical resources: A case study for Serbian

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2015)

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...

... for which we could have used the TreeTagger trained for Serbian that was used for the lemmatization of the Corpus of Contemporary Serbian [16]. However, this lemmatizer was trained on a corpus that differs significantly from our collection, and additionally it does not take into account MWUs. The approach ...
... much as possible [7]. These local grammars were organized in cascades that further resolve ambiguities [10]. NER system was evaluated on a newspaper corpus and results reported in [7] showed that F -measure of recognition was 0.96 for types and 0.92 fot tokens. For the purpose of indexing, we applied ...
... Nikolić, V.: The Develop- ment of the GeolISSTerm Terminological Dictionary. INFOtheca 12(1), 49a–63a (August 2011) 16. Utvić, M.: Annotating the Corpus of contemporary Serbian. INFOtheca – Journal of Informatics & Librarianship 12(2), 36a–47a (2011) 17. Vossen, P.: EuroWordNet: a multilingual database ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian

Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek (2021)

slogovi, distribucija rang-preciznost, slovenski jezici

Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek. "Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian" in Language and Text: Data, models, information and applications, John Benjamins Publishing Company (2021). https://doi.org/10.1075/cilt.356.04ruj
Речници у дигиталном добу - информатичка подршка за српски језик

Биљана Рујевић (2022)

Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...

електронски речници, лексикографска база података, лексички ресурси, српски језик

Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
Keyword Extraction from Parallel Abstracts of Scientific Publications

Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić (2017)

... extraction method. The method is based on the structural and statistical properties of text represented as a complex network. The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the con- ...
... relations as edges (links). The weight of the link is pro- portional to the overall co-ccurrence frequencies of the corresponding word pairs within a corpus. We will focus on the network construction around co-occurrence relations of adjacent words within sentences, since it requires no semantic or syn- ...
... relies on lexical resources for modeling various syntactic structures of multi-word terms. It is applied in several domains, also among them is the corpus of Serbian texts from the geology and mining domain containing more than 600,000 simple word forms. Part of this approach was the automatic elimination ...
Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
Integrisano okruženje za pripremu paralelizovanog korpusa

Ivan Obradović, Ranka Stanković, Miloš Utvić (2007)

Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...

... JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC'06, ELRA, Paris, 2006. [6] Tomaž Erjavec: Compiling and Using the IJS-ELAN Parallel Corpus. Informatica, 26(3), pp. 299-307, 2002. ...
... prevođenja. Drugi modul potom vrši konverziju tako dobijenih dokumenata u vertikalizovan tekst. U poslednjem koraku koristi se programski paket IMS Corpus Workbench (CWB), razvijen na Univerzitetu u Štutgartu6 koji omogućava kreiranje korpusa sa morfološkom i strukturnom anotacijom, indeksiranje tekstova ...
Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
Softverski alati za korišćenje resursa za srpski jezik

Ivan Obradović, Ranka Stanković (2008)

... Prolex should also be mentioned. Besides different types of e-dictionaries, the Group is engaged in developing other resourc- es, such as the e-corpus of Serbian, as well as parallel multilingual corpora composed of par- allel texts or bi-texts, usually comprising two texts of which one is original ...
... (Barzilay i McKeown, 2001). The Human Language Technology Group developed several aligned corpora, among them the largest one being the French-Serbian corpus which contains more than a million words (Vitas and Krstev, 2005). 3 WS4LR – a tool for maintenance and integrated use of lexical resources With ...
... tools. When the web is concerned, further develop- ment of WS4QE functions is underway, as well as the integration of developed functions and the corpus for Serbian language, which is also partly available on the web. Finally the development of a mobile application, for PDA devices and cell phones ...
Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
Keyword-Based Search on Bilingual Digital Libraries

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović (2017)

This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić (2012)

This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...

multilingual digital libraries, query expansion, TMX

... primarily due to the growing needs of statistical machine translation. Thus, for example, the OPUS corpus offers freely available parallel corpora in many languages, as well as interfaces for querying the corpus data [Tiedemann, 2009]. Another example of a system that uses parallel corpora for information ...
... dictionaries of simple words and multi-word units [Krstev, 2008]. These comprehensive resources were developed and are being mainly used within two corpus processing systems: Unitex and Nooj. However, Unitex standalone routines enable the usage of morphological dictionaries developed under Unitex ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)

Претрага

83 items

Terminology Acquisition and Description Using Lexical Resources and Local Grammars cite

Multi-word Expressions for Abusive Speech Detection in Serbian cite

EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School cite

From DELA Based Dictionary to Leximirka Lexical Database cite

Infotheca (Q25460443) in Wikidata cite

Resource-based WordNet Augmentation and Enrichment cite

Transformer-Based Composite Language Models for Text Evaluation and Classification cite

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model cite

The Nooj System as Module within an Integrated Language Processing Environment cite

Frequency and Length of Syllables in Serbian cite

Part of Speech Tagging for Serbian language using Natural Language Toolkit cite

E-Connecting Balkan Languages cite

Indexing of textual databases based on lexical resources: A case study for Serbian cite

Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian cite

Речници у дигиталном добу - информатичка подршка за српски језик cite

Keyword Extraction from Parallel Abstracts of Scientific Publications cite

Integrisano okruženje za pripremu paralelizovanog korpusa cite

Softverski alati za korišćenje resursa za srpski jezik cite

Keyword-Based Search on Bilingual Digital Libraries cite

A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals cite

Terminology Acquisition and Description Using Lexical Resources and Local Grammars

Multi-word Expressions for Abusive Speech Detection in Serbian

EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School

From DELA Based Dictionary to Leximirka Lexical Database

Infotheca (Q25460443) in Wikidata

Resource-based WordNet Augmentation and Enrichment

Transformer-Based Composite Language Models for Text Evaluation and Classification

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

The Nooj System as Module within an Integrated Language Processing Environment

Frequency and Length of Syllables in Serbian

Part of Speech Tagging for Serbian language using Natural Language Toolkit

E-Connecting Balkan Languages

Indexing of textual databases based on lexical resources: A case study for Serbian

Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian

Речници у дигиталном добу - информатичка подршка за српски језик

Keyword Extraction from Parallel Abstracts of Scientific Publications

Integrisano okruženje za pripremu paralelizovanog korpusa

Softverski alati za korišćenje resursa za srpski jezik

Keyword-Based Search on Bilingual Digital Libraries

A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals