Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

101 items

E-Connecting Balkan Languages

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva (2009)

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.

Query expansion, e-dictionary, wordnet, proper name, aligned text

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009) M33
An Italian-Serbian Sentence Aligned Parallel Literary Corpus

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić (2023)

This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...

Aligned corpus, parallel corpus, Serbian, Italian, literature

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388 М53
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović (2017)

U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...

Omeka, Wordnet, pretraga punog teksta, morfološka i semantička pretraga teksta, proširenje upita

Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017) М33
Using English Baits to Catch Serbian Multi-Word Terminology

Cvetana Krstev, Branislava Šandrih, Ranka Stanković (2018)

In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...

aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inﬂection

Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018) M33
The Effects of Multi-Word Tagging on Text Disambiguation

Utvić Miloš, Obradović Ivan, Krstev Cvetana, Vitas Duško (2010)

Utvić Miloš, Obradović Ivan, Krstev Cvetana, Vitas Duško. "The Effects of Multi-Word Tagging on Text Disambiguation" in Proceedings of the 29th International Conference on Lexis and Grammar, LGC 2010, September 2010, Belgrade, Serbia, D. Vitas and C. Krstev (eds.), Belgrade:Faculty of Mathematics, University of Belgrade (2010): 333-342 M63
It-Sr-NER: CLARIN Compatible NER and Geoparsing Web Services for Italian and Serbian Parallel Text

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić (2023)

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić. "It-Sr-NER: CLARIN Compatible NER and Geoparsing Web Services for Italian and Serbian Parallel Text" in Linköping Electronic Conference Proceedings, Linköping University Electronic Press (2023). https://doi.org/10.3384/ecp198010 М33
It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić (2023)

The paper will present the results of the project `“It-Sr-NER: Web services for named entities recognition, linking and mapping,” in which teams from the University of Turin and the Society for Language Resources and Technologies JeRTeh participated, and whose goal was the development of the It-Sr-NER web service for named entity annotations in the text and displaying them on the map. Named entities in these services are names of persons, places, organizations, demonyms (ethnicities), events and works of art.

General Engineering

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić. "It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map" in Infotheca, Belgrade : Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.3 М53
The Dictionary of the Serbian Academy: from the Text to the Lexical Database

Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo (2018)

In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...

computer lexicography, lexical database, language resources, dictionary, Serbian language

Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018) M33
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons

Mihailo Škorić (2017)

The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...

data mining, information extraction, emotions, text on the web

Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4 М53
Old or New, We Repair, Adjust and Alter (Texts)

Cvetana Krstev, Ranka Stanković (2020)

U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.

ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja

Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3 М53
SrpELTeC: A Serbian Literary Corpus for Distant Reading

Ranka Stanković, Cvetana Krstev, Duško Vitas (2024)

U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...

digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analytics

Ranka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03 М23
Semantic Textual Similarity of Courses Based on Text Embeddings

Olivera Kitanović, Aleksandra Tomašević, Mihailo Škorić, Ranka Stanković, Ljiljana Kolonja (2024)

This paper explores the application of textual embeddings to measure semantic similarity between educational courses’ curriculums, aiming to enhance the effectiveness of the next faculty accreditation. Leveraging state-of-the-art natural language processing techniques, we employ pre-trained embeddings to capture the semantic meaning of course descriptions. Our methodology involves transforming course curriculum texts into high-dimensional vector representations, enabling efficient and meaningful comparisons. We evaluate the proposed approach on a diverse dataset of course descriptions, employing established benchmarks for semantic textual similarity ...

Olivera Kitanović, Aleksandra Tomašević, Mihailo Škorić, Ranka Stanković, Ljiljana Kolonja. "Semantic Textual Similarity of Courses Based on Text Embeddings" in Lecture Notes in Networks and Systems, Springer Nature Switzerland (2024). https://doi.org/10.1007/978-3-031-71419-1_27 М33
From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)

Milica Ikonić Nešić, Ranka Stanković, Christof Schöch and Mihailo Škorić (2022)

In this paper we present the wikification of the ELTeC (European Literary Text Collection), developed within the COST Action ``Distant Reading for European Literary History'' (CA16204). ELTeC is a multilingual corpus of novels written in the time period 1840—1920, built to apply distant reading methods and tools to explore the European literary history. We present the pipeline that led to the production of the linked dataset, the novels’ metadata retrieval and named entity recognition, transformation, mapping and Wikidata population, ...

Wikidata, linked data, SPARQL, distant reading, literary corpus, named entity linking, ELTeC

Milica Ikonić Nešić, Ranka Stanković, Christof Schöch and Mihailo Škorić. "From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)" in Proceedings of The 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022) М33
Towards ELTeC-LLOD: European Literary Text Collection Linguistic Linked Open Data

Ranka Stanković, Christian Chiarcos, Miloš Utvić, Olivera Kitanović (2023)

Овај рад описује студију случаја о генерисању повезаних података креираних на основу обечежених текстуалних корпуса коришћењем формата размене података у обради природних језика (NIF). Као основа за ово истраживање послужио је подскуп корпуса ELTeC, који се састоји од 900 романа из периода 1840-1920 за 9 европских језика. Верзија романа са коментарима, у такозваном TEI level-2 формату, трансформисана је у NIF, формат заснован на RDF/OWL који има за циљ постизање интероперабилности између алата за обраду природних језика, језичких ресурса и ...

повезани отворени подаци, корпус, SrpELTeC, NIF

Ranka Stanković, Christian Chiarcos, Miloš Utvić, Olivera Kitanović. "Towards ELTeC-LLOD: European Literary Text Collection Linguistic Linked Open Data" in LDK 2023 – 4th Conference on Language, Data and Knowledge, 12-15 September in Vienna, Austria, Lisabon : NOVA FCSH - CLUNL (2023). https://doi.org/10.34619/srmk-injj М33
A Mathematical Learning Environment Based on Serbian Language Resources

Radojičić Marija, Obradović Ivan, Stanković Ranka, Utvić Miloć, Kaplar Sebastijan (2018)

In recent years, in line with ever growing usage of Information technology, the learning environments are changing. The amount of available learning materials in various forms has increased. These new environments demand comprehensive learning systems, which enable management of the learning corpus with special attention paid to relevant lexical resources. In this paper we present the concept of a Mathematical Learning Environment in Serbian (MLES), which is based on a corpus of mathematical materials and various lexical resources, enabling ...

mathematical content, text processing, mathematical formulae

Radojičić Marija, Obradović Ivan, Stanković Ranka, Utvić Miloć, Kaplar Sebastijan. "A Mathematical Learning Environment Based on Serbian Language Resources" in Proceedings of the 7th International Scientific Conference Technics and Informatics in Education, Faculty of Technical Sciences, Čačak (2018) М33
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić (2022)

In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published ...

Corpus, Distant Reading, Digital Humanities, Linked Data, Named Entity Recognition, Text Analytics

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić. "Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection" in Proceedings of the Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022) М33
Transformer-Based Composite Language Models for Text Evaluation and Classification

Mihailo Škorić, Miloš Utvić, Ranka Stanković (2023)

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...

General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)

Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660 М21а
Keyword-Based Search on Bilingual Digital Libraries

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović (2017)

This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10 M14
Measuring semantic relevance of words in synsets

Obradović Ivan, Krstev Cvetana, Vitas Duško (2010)

Obradović Ivan, Krstev Cvetana, Vitas Duško. "Measuring semantic relevance of words in synsets" in Text and Language, Structures · Functions · Interrelations. Quantitative Perspectives, P. Grzybek, E. Kelih, J. Mačutek (eds.), Wien:Praesens Verlag (2010): 133-144 M14
New Language Models for South Slavic Languages

Mihailo Škorić (2024)

Izlaganje će predstaviti izazove i perspektive modelovanja južnoslovenskih jezika, sa posebnim osvrtom opšte jezičke modele građene na arhitekturi transformera (BERT, GPT), na dostupne skupove tekstova za obučavanje tih modela, te kvantitet i kvalitet tih skupova. Izlaganje će ponuditi pregled dostupnih skupova i modela, dok će posebna pažnja biti posvećena najnovijim korpusima tekstova. Prvi korpus, Kišobran, predstavlja krovni veb korpus južnoslovenskih jezika i ujedno trenutno najveći korpus tekstova na našim prostorima koji broji preko osamnaest milijardi reči i uključuje sve ...

veliki korpusi teksta, jezički modeli, južnoslovenski jezici

Mihailo Škorić. "New Language Models for South Slavic Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024) М64

Претрага

101 items

E-Connecting Balkan Languages cite

An Italian-Serbian Sentence Aligned Parallel Literary Corpus cite

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis cite

Using English Baits to Catch Serbian Multi-Word Terminology cite

The Effects of Multi-Word Tagging on Text Disambiguation cite

It-Sr-NER: CLARIN Compatible NER and Geoparsing Web Services for Italian and Serbian Parallel Text cite

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map cite

The Dictionary of the Serbian Academy: from the Text to the Lexical Database cite

Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons cite

Old or New, We Repair, Adjust and Alter (Texts) cite

SrpELTeC: A Serbian Literary Corpus for Distant Reading cite

Semantic Textual Similarity of Courses Based on Text Embeddings cite

From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back) cite

Towards ELTeC-LLOD: European Literary Text Collection Linguistic Linked Open Data cite

A Mathematical Learning Environment Based on Serbian Language Resources cite

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection cite

Transformer-Based Composite Language Models for Text Evaluation and Classification cite

Keyword-Based Search on Bilingual Digital Libraries cite

Measuring semantic relevance of words in synsets cite

New Language Models for South Slavic Languages cite

E-Connecting Balkan Languages

An Italian-Serbian Sentence Aligned Parallel Literary Corpus

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Using English Baits to Catch Serbian Multi-Word Terminology

The Effects of Multi-Word Tagging on Text Disambiguation

It-Sr-NER: CLARIN Compatible NER and Geoparsing Web Services for Italian and Serbian Parallel Text

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

The Dictionary of the Serbian Academy: from the Text to the Lexical Database

Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons

Old or New, We Repair, Adjust and Alter (Texts)

SrpELTeC: A Serbian Literary Corpus for Distant Reading

Semantic Textual Similarity of Courses Based on Text Embeddings

From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)

Towards ELTeC-LLOD: European Literary Text Collection Linguistic Linked Open Data

A Mathematical Learning Environment Based on Serbian Language Resources

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Transformer-Based Composite Language Models for Text Evaluation and Classification

Keyword-Based Search on Bilingual Digital Libraries

Measuring semantic relevance of words in synsets

New Language Models for South Slavic Languages