Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

88 items

An Italian-Serbian Sentence Aligned Parallel Literary Corpus

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić (2023)

This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...

Aligned corpus, parallel corpus, Serbian, Italian, literature

Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
E-Connecting Balkan Languages

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva (2009)

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.

Query expansion, e-dictionary, wordnet, proper name, aligned text

... американски щати are connected automatically. 3. Using WS4LR with Aligned Texts The WS4LR module that works with aligned texts expects them to be in Translation Memory eXchange (TMX) format1. It can also transform texts previously aligned by XAlign into that format but also in several other formats: ...
... visualization of aligned texts by applying appropriate XSLT transformations. Thus visualized texts user can freely browse. One such visualization is represented in Figure 1. Browsing, however, is not a particularly successful form of text exploration. WS4LR module for aligned texts offers users ...
... methodological framework was used for their development, and how they were integrated for their successful usage. 2.1 Textual Resources – Aligned Texts The aligned texts as a special form of multilingual corpora were in focus of many projects in past couple of decades. A systematic approach to the ...
Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
Using English Baits to Catch Serbian Multi-Word Terminology

Cvetana Krstev, Branislava Šandrih, Ranka Stanković (2018)

In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...

aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inﬂection

... language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology ...
... extracted 846 different Serbian domain phrases, containing 515 Serbian phrases that were not present in the existing domain terminology. Keywords: aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection 1. Motivation Terminology is rapidly developing in many ...
... Serbia, with the aim of presenting the librarianship terminology on different me- dia (Kovačević et al., 2004). This resource was first used on aligned texts in query ex- pansion (Stanković et al., 2012); the Excel format of the dictionary was at that time transformed into a relational database. The ...
Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Softverski alati za korišćenje resursa za srpski jezik

Ivan Obradović, Ranka Stanković (2008)

... parallel texts. In the majority of cases, parallel texts are be- ing aligned, which turns a parallel texts into an aligned text. Sometimes, it is even considered that parallel texts are the same as aligned texts, but this does not always have to be the case, since non-aligned parallel texts are also ...
... corpora composed of par- allel texts or bi-texts, usually comprising two texts of which one is original, and the other its translation. The majority of these parallel texts are aligned, which means that relations are estab- lished between corresponding elements of both texts (paragraph, sentence, word) ...
... for a synset and its hypernyms 3.4 Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text alignment tool XAlign (Bonhomme et al., 2001). The module enables the transformation of texts aligned by XAlign into different formats: textual ...
Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
Wordnet Development Using a Multifunctional Tool

Ivan Obradović, Ranka Stanković (2007)

In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...

Wordnet development, language resource integration, HLT tools

... 8. Aligned texts with highlighted words Another, more complex option is to use aligned texts. If PWN is used for the source synset, then the language of one of the parallel texts must be English. Namely, WS4LR allows the user to search aligned texts using words from both parallel texts. All ...
... module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool [3]. The module converts these texts to the Translation Memory eXchange (TMX) format, which is becoming the standard format for aligned texts. Figure 4 depicts the form ...
... of aligned parallel texts Parallel texts, which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph, sentence, etc) by matching the corresponding segments of the original and its translation. Aligned parallel texts are ...
Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić (2020)

The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...

Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary

... performed for the production of Serbian MULTEXT-East resources (Krstev et al., 2004)). 2.2. Pre-annotated texts Various pre-annotated texts were used in this research for training and testing. These texts were tagged mainly us- ing SMD (and its tagset) and the Unitex system,1 with manually performed d ...
... All texts had to be mapped to tagsets used by the existing tagger model TT11 and the two new tagger models TT19 and SerSpaCy (see Subsection 3.3.). Although most of the texts were tagged with SMD before mapping to some other tagset, the initial SMD version was not available for all texts (e.g. ...
... et al., 2006). It contains texts from law, health and edu- cation domains. Švejk, Floods, History are three short 1Unitex/GramLab — Cross Plaform Corpus Processing Suite, https://unitexgramlab.org/ 2The category of gender is relevant only for some verbal forms. texts selected, respectively, from ...
Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
Serbian NER&Beyond: The Archaic and the Modern Intertwinned

Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić (2021)

U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...

... NEs in 1253 newspapers and similar texts. It was manually evaluated on a sample of unseen newspaper texts. The overall F1 score of the model was ≈ 96%. To the best of our knowledge, so far there were no attempts to produce a NER system for Serbian literary texts. The enhanced version of SrpNER was la- ...
... forms satisfactorily on similar texts, which can be seen from the model’s performance on the test set displayed in Table 3. Since this collec- tion of novels contains very diverse texts, both lexically and syntactically, SrpCNNER did not generalize that well on unseen texts. 6 Conclusions and Future Work ...
... dubbed SrpNER, that we will describe in Sec- tion 2 together with some approaches to NE recognition in literary texts. This SrpNER model was applied to the raw version of the selected texts from SrpELTeC collection, pre- sented in Section 3. Based on the specifically tailored guidelines, different evaluators ...
Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
Keyword Extraction from Parallel Abstracts of Scientific Publications

Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić (2017)

... author(s), publication date, title, keywords, abstract etc.) and are aligned at the sentence level [15,16]. For the research presented in this paper, we used a collection of 50 bilin- gual documents with approximately 4,800 aligned sentences. Since papers were published bilingually, they were already ...
... English, where most of the papers were originally written in Serbian and then translated into English by professional translators. Texts have various lengths, in Serbian the texts contain from 34 to 259 words (on average 100) and in English from 44 to 286 words (on average 110). The statistics of the used ...
... of annotated keywords ranges from 3 to 18 in the Serbian and from 3 to 15 in the English texts (the average in both is 7). Scientists usually define keywords in their lemmatized form, while in the Serbian texts (and rarely in English) they appear in many inflected forms, which are different from lemma ...
Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Branislava Šandrih, Cvetana Krstev, Ranka Stanković (2019)

In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...

NER, Named Entity Recognition Systems, Serbian, Personal Names

... system. The im- portant next step is the enhancement of our news- paper corpus with other types of text (Wikipedia articles, domain texts, literary texts). The literary texts would be particularly important for improv- ing the recognition of first names. Finally, another intended step is Entity Linking ...
... by the considerably smaller number of these tags in training texts compared to other tags (see Ta- ble 4). As for SRPNER one can presume that de- velopers devoted less effort to this entity type oc- curring only occasionally in newspaper texts. Sim- ilarly, in all experiment settings, the recognition of ...
... used for the first time for the recognition of personal names in Serbian texts. Ljubešić et al. (2013) used STANFORD NER to build models for Croatian and Slovene. When they used distributional similarity to improve re- sults, on texts coming from different sources they obtained the following results: for ...
Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
Bilingual lexical extraction based on word alignment for improving corpus search

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović (2019)

Library and Information Sciences,Computer Science Applications

Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić (2012)

This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...

multilingual digital libraries, query expansion, TMX

... proper name databases, which enables, among other things, versatile handling of both monolingual and aligned or comparable texts. LeXimir provides for enhanced querying of aligned texts by using available lexical resources to perform semantic and morphological expansion of queries. The tool ...
... for search of document collections consisting of aligned parallel texts converted in TMX (Translation Memory eXchange) format. TMX is an open XML-based standard intended for easier exchange of translation memory data, that is, aligned parallel texts, between tools and translation vendors [TMX ...
... development environment for generating aligned parallel texts. It is basically a front-end for two alignment tools developed by LORIA (Laboratoire lorrain de recherche en informatique et ses applications), one for automatic sentence alignment of texts (Xalign, http://led.loria.fr/outils/A ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
Old or New, We Repair, Adjust and Alter (Texts)

Cvetana Krstev, Ranka Stanković (2020)

U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.

ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja

... containing only texts written in Ekavian pronunciation and the other containing only texts written in Ijeka- vian pronunciation. In the case of multiple corrections, they are merged in one entry, as in the SRP_DR dictionary. Specific problems may arise with multiple corrections when transforming texts in either ...
... 2023-10-14 04:19:57 Old or New, We Repair, Adjust and Alter (Texts) Cvetana Krstev, Ranka Stanković Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Old or New, We Repair, Adjust and Alter (Texts) | Cvetana Krstev, Ranka Stanković | Infotheca | 2020 | | 10 ...
... adjust and alter (texts) UDC 811.163.41’322.2: 004.9 DOI 10.18485/infotheca.2019.19.2.3 ABSTRACT: In this paper we present how e-dictionaries and cascades of finite-state transducers, as implemented in Unitex, can be used to solve three text transformation prob- lems: correction of texts after OCR, restora- ...
Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
A Method for Extracting Translational Equivalents from Aligned Texts

Obradović Ivan (2013)

Obradović Ivan. "A Method for Extracting Translational Equivalents from Aligned Texts" in Methods and Applications of Quantitative Linguistics, Selected papers of the 8th International Conference on Quantitative Linguistics (QUALICO) in Belgrade, Serbia, April 26-29, 2012, Ivan Obradović, Emmerich Kelih, Reinhard Köhler (eds.), :University of Belgrade & Academic Mind (2013): 119-129
An Integrated Environment for Management and Exploitation of Linguistic Resources

Ranka Stanković, Ivan Obradović (2009)

... possibility of adding hypernym literals. D. Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text align- ment tool XAlign. The module enables the transformation of texts aligned by XAlign into different formats: textual ...
... is publicly available [3]. C. Parallel and aligned texts Although monolingual parallel texts exist, parallel texts are as a rule bilingual, composed of one original text and its translation into another language. Thus, they represent two texts having the same content, but in two different ...
... different lan- guages. The majority of parallel texts collected within the HLT Groupare are aligned, with Serbian most often being one of the languages. The procedure of transforming paral- lel texts into aligned texts followed two basic steps with the goal of connecting equivalent segments ...
Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
The Nooj System as Module within an Integrated Language Processing Environment

Ranka Stanković, Duško Vitas, Cvetana Krstev (2008)

NooJ, electronic dictionary, lexical resources

... alignment of multilingual texts. WS4LR handles aligned texts as well. A pair of semantically equivalent texts in different languages, such as an original text and its translation, that are aligned on a structural level (paragraph, sentence, phrase, etc.) is known as an aligned text or bitext. One ...
... WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool (Bonhomme 2001). Parallel texts which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph ...
... translation. The module converts these texts to the Translation Memory eXchange (TMX) format, which is becoming the standard format for aligned texts. Figure 7 depicts the form with different possibilities for TMX document management. Aligned texts can be visualized in various ways by choosing ...
Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
WS4LR - a Worksation for Lexical Resources

Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović (2006)

Lexical Resources, Wordnet, Serbian

... in Appendix B. 2.3 Aligned Texts A pair of semantically equivalent texts in different langauges, such as an original text and its translation, that are and aligned on a structural level (paragraph, sentence, phrase, etc.) is known as an aligned text or bitext. Aligned texts are usually constructed ...
... chosen synset in a text, with or without synset hypernyms. 3.4 Working with Aligned Texts The module uses texts which have previously been aligned using Xalign as an alignment tool and converts them to TMX format, or texts that are already in that format. By choosing the appropriate XSLT stylesheet ...
... step, the texts to be aligned are segmented into equivalent units, and in the second step the correspondence between these units is established. The equivalent units are usually sentences, but the units can be larger, as well as smaller. The standard method for representing aligned texts is the ...
Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović (2017)

U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...

Omeka, Wordnet, pretraga punog teksta, morfološka i semantička pretraga teksta, proširenje upita

... FORENSIC LINGUISTICS The linguistic study of forensic texts is a part of the field of Natural Language Processing, which includes text types classification and syntax and semantic analysis of texts written in a natural language. Various texts are subject of the study: Acts of Parliament (or other ...
... l dictionaries cover large lexica, but each special domain has characteristic words that are occurs occasionally in ordinary texts, but frequently in domain specific texts. That is the case with presented collection. Among unrecognized tokens were terms: 18I. Obradović, R. Stanković, “Wordnet ...
... and „upad“ (intrusion) have negative sentiment polarity scores (0.75 and 0.125) respectively, which makes possible classify texts containning these terms as „forensic texts“. 19 Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, and Aleksandra Trtovac, “Rule-based Automatic ...
Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
Keyword-Based Search on Bilingual Digital Libraries

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović (2017)

This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...

Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
Vebran Web Services for Corpus Query Expansion

Ranka Stanković, Miloš Utvić (2020)

U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.

corpus search, web service, Serbian lexical resources, query expansion

... paragraph, sentence) are annotated in some particular corpus texts, especially those which are part of aligned corpora. The SrpKor2013 corpus is used by more than 700 users, mostly Slavists. 2.2 RudKor Systematic collection and preparation of texts from the mining domain started with English-Serbian alignment ...
... 122 million corpus words. It includes literary texts of Serbian writers in the XX and XXI centuries, as well as scientific and popular science texts from different domains (natural and so- cial sciences), administrative and general texts. The general texts represent articles from the daily newspapers ...
... subset SrpLemKor2; – SrpEngKor3, aligned English-Serbian corpus including subcorpus SELFEH (Serbian-English Law Finance Education and Health) with documents on finance, health, law and education; – SrpFranKor4, aligned French-Serbian corpus; – SrpNemKor5, aligned German-Serbian corpus; – RudKor6, a ...
Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
A Lexical Approach to Acronyms and their Definitions

Cvetana Krstev, Duško Vitas, Ranka Stanković (2015)

In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.

... contains 70% of newspaper texts (57% daily, 8% weekly and 5% monthly newspapers) and 6% of monographs and textbooks (Krstev and Vitas, 2005), which are types of texts that tend to use acronyms and pro- vide definitions. Besides that we used two more samples of newspaper texts (having 600 thousand and ...
... Sgarbas, and S. Panagiotopoulou, 2014. Acronym identification in Greek legal texts. Literary and Linguistic Computing, 30(3):440–541. Wolinski, F., F. Vichot, and B. Dillet, 1995. Automatic processing of proper names in texts. In Proceedings of the 7th conference on European chapter of the ACL. Morgan ...
... ac.rs, †ranka@rgf.bg.ac.rs Abstract In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that ...
Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)

Претрага

88 items

An Italian-Serbian Sentence Aligned Parallel Literary Corpus cite

E-Connecting Balkan Languages cite

Using English Baits to Catch Serbian Multi-Word Terminology cite

Softverski alati za korišćenje resursa za srpski jezik cite

Wordnet Development Using a Multifunctional Tool cite

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian cite

Serbian NER&Beyond: The Archaic and the Modern Intertwinned cite

Keyword Extraction from Parallel Abstracts of Scientific Publications cite

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names cite

Bilingual lexical extraction based on word alignment for improving corpus search cite

A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals cite

Old or New, We Repair, Adjust and Alter (Texts) cite

A Method for Extracting Translational Equivalents from Aligned Texts cite

An Integrated Environment for Management and Exploitation of Linguistic Resources cite

The Nooj System as Module within an Integrated Language Processing Environment cite

WS4LR - a Worksation for Lexical Resources cite

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis cite

Keyword-Based Search on Bilingual Digital Libraries cite

Vebran Web Services for Corpus Query Expansion cite

A Lexical Approach to Acronyms and their Definitions cite

An Italian-Serbian Sentence Aligned Parallel Literary Corpus

E-Connecting Balkan Languages

Using English Baits to Catch Serbian Multi-Word Terminology

Softverski alati za korišćenje resursa za srpski jezik

Wordnet Development Using a Multifunctional Tool

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Serbian NER&Beyond: The Archaic and the Modern Intertwinned

Keyword Extraction from Parallel Abstracts of Scientific Publications

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Bilingual lexical extraction based on word alignment for improving corpus search

A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals

Old or New, We Repair, Adjust and Alter (Texts)

A Method for Extracting Translational Equivalents from Aligned Texts

An Integrated Environment for Management and Exploitation of Linguistic Resources

The Nooj System as Module within an Integrated Language Processing Environment

WS4LR - a Worksation for Lexical Resources

Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis

Keyword-Based Search on Bilingual Digital Libraries

Vebran Web Services for Corpus Query Expansion

A Lexical Approach to Acronyms and their Definitions