Претрага
2842 items
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... Duško Vitas, Ivan Obradović, “The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines”, in Proceedings of the Sixth Interantional Conference on Language To keep development and use of the applications and resources at the same time, without frequent conversions ...
... Obradović, “The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines”, in Proceedings of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 28-30 May 2008, European Language Resources Association (ELRA), 2008 ...
... shows a diagram of a use case for corpus preparation that includes: collection of articles, lexical processing resources, describing text with metadata, analysis of unknown words, complement morphological dictionaries, addition to terminology database, transliteration, correction of broken words, ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for ...
... language resources management, developed within the University of Belgrade HLT group (Stanković et al. 2011). The whole process is automated, and takes place with very little human intervention, starting from the tokenization and lexical analysis of a raw text up to production of dictionary ...
... acquisition and description using lexical resources and local grammars. In Proc. of the Conf.Terminology and Artificial Intelligence 2015, Granada: University of Granada, pp. 81--89. Malyszko, J., Abramowicz, W., Filipowska, A., & Wagner, T. (2015). Lemmatization of Multi-Word Entity Named for Polish ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... enable the use of these resources on the Web is under development. As for the development of the lexical resources, we plan to prepare an ontology for the classification of abusive data, including tweets, to tackle ambiguity in hate speech detection [20]. The development of the lexicon of abusive words ...
... and their relations in the web of data. Moreover, it is used to make lexical data sets accessible via http(s), to publish them in accordance with W3C-standards such as RDF and SPARQL, and to provide links between lexical data sets and other LOD resources [8]. The goal of our research is to make its results ...
... automatic classification of abusive tweets and the first results are comparable with the results on similar data sets for other languages ([25, 44, 47]). The focus of our current research is the usage of a hybrid approach that combines machine learning and lexical resources. Finally, a user-friendly ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Building learning capacity by blending different sources of knowledge
... from various lexical resources. Author I 1 Introduction In our era of extremely rapid technological development, in many disciplines, especially those related to any form of engineering ...
... the search and browse functions of BMP. The BMP language support system, whose structure is outlined in Figure 3, is based on electronic language resources, namely, lexical resources, textual resources and grammars. The simplest multilingual lexical resources in general are bilingual dictionaries ...
... dictionaries. Morphological dictionaries of Serbian simple words and compounds in the so-called LADL format (Krstev et al., 2010) are thus a necessary part of the lexical resources used by the BMP language support system. Besides Serbian, such resources exist for many other languages, including ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Dalibor Vorkapić. "Building learning capacity by blending different sources of knowledge" in International Journal of Learning and Intellectual Capital (2016). https://doi.org/10.1504/IJLIC.2016.075698
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... informacija;" (goal of finding information). A Software solution for multi-word units extraction displayed in Figure 2 offers possibilities for general NLP processing on selected corpus (apply- ing lexical resources, generating bag of words and extraction of unknown words), extraction of selected syntactic ...
... scientific and technological domains. Purely manual pro- duction of terminological resources is not the solution due to rapid changes both in research fields and corresponding terminology. Multi-Word Expressions (MWEs) are lexical units composed of more than one word, which are syntactically, semantically ...
... is possible to compile a bilingual aligned terminological list. This paper is organised as follows. An overview of previous work on this topic is given in Section 2. Lexical resources and tools that were used in the experiments in Subsection 3. The proposed approach is thoroughly explained in Section ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
Possibilities of retro-digitalized German-Serbian Mining Dictionary
U radu će biti prikazan opis procesa retrodigitalizacije dvojezičnog Nemačko-srpskog rudarskog rečnika iz 1923. godine čiji je autor rudarski inženjer Dragutin Stepanović (Степановић, 1923). Ovaj rečnik je zasnovan na skoro 4 000 leksičkih zapisa koji su prevodilački ekvivalenti ili uputnice. Umesto predgovora autor daje uvid u svoje pismo upućeno “Ministru šuma i rudnika” u kome piše o nameri da zabeleži reči koje se koriste u narodu kako bi izbegao upotrebu nemačkih reči. Iako broj odrednica nije toliko veliki, rečnik ...Biljana Lazić, Olivera Kitanović, Ivan Obradović. "Possibilities of retro-digitalized German-Serbian Mining Dictionary" in E-dictionaries and E-lexicography, Zagreb, 10-11 May 2019, Zagreb : Institut za hrvatski jezik i jezikoslovlje (2019)
-
Building Terminological Resources in an e-Learning Environment
... a need to provide for all inflectional forms of terms, as they can be of importance for annotation of lessons. This morphological expansion is realized by the use of lexical resources [6] and the Vebranka web service [7]. The user can select which of the aforementioned term types will be mapped ...
... Obradovic, I., The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines in 6th LREC, Marrakech, Marocco, 2008. [8] Mitrović, A., Devedžić, V., A model of multitutor ontology-based learning environments, International Journal of Continuing Engineering Education ...
... , A population is the set of all individuals of interest in a particular study популација, Популација је скуп скуп свих индивидуа од интереса у неком истраживању. In the course of our experimenting we concluded that Moodle glossaries are not proper lexical resources, since the format offered ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... I. Obradović, “The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines”, in Proceedings of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, European Language Resources Association (ELRA), May ...
... the Serbian wordnet (SWN), a lexical data- base representing the semantic network of words in Serbian. Within this group of resources, the multilingual ontological dictionary of proper names Prolex should also be men- tioned. V Besides these different types of dictionaries, the Group is ...
... available on the web as well. As the majority of functions preformed by WS4LR and WS4QE overlap, in this section we shall describe only some of the basic functions of WS4LR related to management and development of individual resources. Integrated use of re- sources will be illustrated in the following ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... E-Dicitonary of Compounds, The 28th Conf. on Lexis and Grammar, Bergen, 29th September - 3rd October 2009, In Arena Romanistica eds. B. Lamiroy et al, pp. 204-212, University of Bergen, Department of Foreign Languages. Laporte, E. and Monceaux, A. (1999). "Elimination of lexical ambiguities by ...
... Polish. In Proc. of the Workshop on Morphological Processing of Slavic Languages : 10th Conference EACL 2003, Budapest, Hungary, April 13th, 2003, eds. T. Erjavec and D. Vitas, pp. 33-40. Savary, A. (2008). Computational Inflection of Multi-Word Units – A Contrastive Study of Lexical Approach, In: ...
... language and content resources – Data Categories – Specification of data categories and management of a data category registry for language resources Kešelj, V., Kešelj, T., and Zlatić, L. (2004). R{j}ecnik.com: English-Serbo-Croatian electronic dictionary. In Proceedings of the Workshop on Enhancing ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
-
Towards translation of educational resources using GIZA++
... TRANSLATION OF EDUCATIONAL RESOURCES USING GIZA++ IVAN OBRADOVIĆ University of Belgrade, Faculty of Mining and Geology, ivan.obradovic@rgf.bg.ac.rs DALIBOR VORKAPIĆ University of Belgrade, Faculty of Mining and Geology, dalibor.vorkapic@rgf.bg.ac.rs RANKA STANKOVIĆ University of Belgrade ...
... anyone, anywhere – regardless of what language they speak. 3. TRANSLATION OF EDUCATIONAL RESOURCES - CURRENT APPROACHES For translation of eLearning resources both language translation, and eLearning skills are necessary. The translation team needs knowledge of various software platforms and ...
... that stands in the way of the development of online courses as the majority of such courses are offered in English. Thus a growing need for translating MOOC content. The solutions provided so far have been fragmentary, human-based, and implemented off-line by the majority of course providers. [2] ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... Supporting resources Three types of lexical resources are used for the expansion of queries submitted to our collection of documents. The most important resources are Serbian morphological dictionaries of simple words and multi-word units [Krstev, 2008]. These comprehensive resources were developed ...
... composed of several modules as depicted in Figure 3. Targeted at textual resources in the form of collections of TMX documents and the corresponding metadata, the system has at its disposal several other lexical resources, such as morphological e-dictionaries. Together with the system of rules ...
... versatile handling of both monolingual and aligned or comparable texts. LeXimir provides for enhanced querying of aligned texts by using available lexical resources to perform semantic and morphological expansion of queries. The tool was, however, unsuitable for large collections of documents such ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different ...
... lemon based lexical database. In Proceedings of LREC, pages 18– W23. Tufiş, D., Koeva, S., Erjavec, T., Gavrilidou, M., and Krstev, C. (2009). Building language resources and translation models for machine translation focused on south slavic and balkan languages. Scientific results of the SEE-ERA ...
... Linguistics: Human Language Technologies, pages 271–281. Constant, M., Krstev, C., and Vitas, D. (2018). Lexical analysis of serbian with conditional random fields and large-coverage finite-state resources. In Zygmunt Vetu- lani, et al., editors, Human Language Technology. Chal- lenges for Computer Science ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School
Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku... In Proceed- ings of the 2020 Globalex Workshop on Linked Lexicography, 1–9. Chiarcos, Christian, John McCrae, Philipp Cimiano, and Christiane Fell- baum. 2013. “Towards open data for linguistics: Linguistic linked data.” In New Trends of Research in Ontologies and Lexical Resources, 7–25. Springer. ...
... social event organization and oppor- tunities to network. The knowledge and skills acquired there will improve the development of Serbian linguistic resources and help to publish more resources as linguistic linked data. Acknowledgment This paper is supported by the COST Action CA18209 - NexusLinguarum ...
... program of the school is available online.13 As a follow up, the JeRTeh14 Language Resources and Technologies So- ciety set up a local installation of VocBench15 and, apart from JeRTeh mem- bers, it was used by students and teachers of the Intelligent Systems PhD program16 at the University of Belgradefor ...Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
-
Development of terminological resources for expert knowledge: a case study in mining
Ljiljana Kolonja, Ranka Stanković, Ivan Obradović, Olivera Kitanović, Aleksandar Cvjetić. "Development of terminological resources for expert knowledge: a case study in mining" in Knowledge Management Research & Practice, Palgrave Macmillan (2015). https://doi.org/10.1057/kmrp.2015.10
-
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... a final goal to incorporate collected information into lexical resources for Serbian. In order to achieve these goals we have to deal with complex inflection of both Ser- bian MWUs and acronyms. We have followed these steps: 1. Extraction of pairs Acronym – Definition from a large corpus, where a ...
... dr.rgf.bg.ac.rs The Digital repository of The University of Belgrade Faculty of Mining and Geology archives faculty publications available in open access, as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs A Lexical Approach to Acronyms and their Definitions ...
... techniques have not encountered them in training corpora, while those based on lexical resources do not have them listed in lex- icons. However, their adequate treatment is crucial for many applications, e.g. text-to-speech systems (Taylor, 2009), machine translation (Wolinski et al., 1995), index- ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
-
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, VikipodaciRanka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... microstructure of a SASA dictionary entry (Stijović and Stanković 2017). This made possible the design of a lexical database that can store a structured record of a dictionary article in a relational structure, and the development of a software solution that transforms the unstructured text of a Word document ...
... developed, this type of definition will be used in the models that still have to come to light. Local grammars that model definitions of nouns (and other types of words) will contribute to the creation of dictionaries and other lexical resources in various ways. For instance, when creating a dictionary ...
... that synthesize answers to questions of the type “What is…” based on one or more sources, and dictionary writing and ontology development (Navigli & Velardi 2010). The approaches to solving this problem are often based on the development and application of lexical-syntactic patterns. For example, in ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Integracija heterogenih tekstualnih resursa
Ranka Stanković, Ivan Obradović (2007)U radu je opisan pristup integraciji heterogenih tekstualnih resursa za srpski jezik uz pomoć jednog kompleksnog softverskog alata, razvijenog specijalno za ove potrebe. Opisani su struktura i osnovne komponente razvijenog sistema. Iznete su i mogućnosti unapređivanja resursa međusobnom razmenom informacija, koje pruža razvijeno integrisano okruženje. Konačno, opisana je i mogućnost primene integrisanih heterogenih resursa za proširenje upita, kao i pretraživanje tekstova uopšte, a naznačeni su i neki od pravaca daljeg razvoja.... for Lexical Resources), which synchronously handles corpora of Serbian, multilingual aligned corpora, a system of morphological dictionaries for Serbian, the Serbian wordnet and the multilingual ontology of proper names Prolex. We describe the possibilities WS4LR offers for enhancement of these ...
... The diversity of textual resources University of Belgrade for many years, as well a a necessity of creating an appropriate tool whi ance, usage, further development and integration. In this paper we outline the structure and main components of the system we developed under the name of WS4LR (WorkStation ...
... Fellbaum, C. (Hg.). (1998): WordNet: An Electronic Lexical Database. Cambridge, Massachusetts: MIT Press. Krstev et al. 2006 – Krstev, C. et al. (2006): WS4LR: A Workstation for Lexical Resources. In: Proceedings of the 5 th Internationa Resources and Evaluation, LREC 2006. Genoa, May 2006. S. 1692–1697 ...Ranka Stanković, Ivan Obradović. "Integracija heterogenih tekstualnih resursa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... vol. 12 no. 2 pp 36a-47a, Dec. 2011 [7] M. Constant, C. Krstev, and D. Vitas “Lexical Analysis of Serbian with Conditional Random Fields and Large-Coverage Finite-State Resources”, Proc. 7th Language and Technology Conference (LTC), Poznan, Poland, Nov. 2015 [8] N. Ljubešić, F. Klubička, Ž. Agić ...
... typology,” Proc. Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 2014 [14] C. Krstev and D. Vitas, “Serbian Morphological Dictionary – SMD,” University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0, 2015 [15] A. Balvet, D. Stošić, and ...
... novel Enciklopedija Mrtvih [16], having 23,886 tokens in 946 sentences. III. TAGGING After the resources are ready, the process of tagging is made simple with the help of NLTK. There are a plenty of tagger models packaged in NLTK that can be trained. Every tagger has an evaluation procedure that ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)