Претрага
51 items
-
Sentiment Analysis of Serbian Old Novels
In this paper we present first study of Sentiment Analysis (SA) of Serbian novels from the 1840-1920 period. The preparation of sentiment lexicon was based on three existing lexicons: NRC, AFFIN and Bing with additional extensive corrections. The first phase of dataset refinement included filtering the word that are not found in Serbian morphological dictionary and in second automatic POS tagging and lemma were manually corrected. The polarity lexicon was extracted and transformed into ontolex-lemon and published as initial ...Ranka Stanković, Miloš Košprdić, Milica Ikonić Nešić, Tijana Radović. "Sentiment Analysis of Serbian Old Novels" in Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data, June 2022, Marseille, France, European Language Resources Association (2022)
-
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Sofia, Bulgaria, 9-10 September 2024, LREC | COLING (2024)
-
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... multilingual online lexicon of hate speech available at hatebase.org in their research. (Wiegand et al., 2018; Silva et al., 2016; Nobata et al., 2016). Wiegand et al. (2018) built a lexicon of abusive words using the subjectivity lexicon of Therese Wilson that is in essence a sentiment lexicon. They took words ...
... the abusive words lexicon development, we plan to use: lists of slurs, abusive expressions, and courses built by conducting surveys and crowdsourcing (Mitrović et al., 2015), slang and dictionaries of synonyms, translation of existing lexicons in other languages, sentiment lexicon for Serbian language ...
... addition, we plan to include the context rules and intensifiers following the approach presented in (Moreno-Ortiz et al., 2013) about the MWEs sentiment lexicon for Spanish. Additional attention will be given to the extension of the vocabulary with expressions that are not present in any existing lexicons ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
-
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface
Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos (2024)Predstavljamo trenutne aktivnosti na definisanju interfejsa leksikona i korpusa koji će služiti kao referenca u prikazu polileksemskih jedinica - višečlanih izraza - (različitih tipova - imenskih, glagolskih, itd.) u specijalizovanim leksikonima i povezivanju ovih unosa sa njihovim pojavljivanjima u korpusima. Konačni cilj je korišćenje ovakvih resursa za automatsko identifikovanje višečlanih izraza u tekstu. Uključivanje nekoliko prirodnih jezika ima za cilj univerzalnost rešenja koje nije usredsređeno na određeni jezik, kao i prilagođavanje idiosinkrazijama. Raspravljaju se izazovi u leksikografskom opisu višerečnih ...Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos. "Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... the creation of a lexicon of offensive words are lists of swear words, curses, abusive expressions, existing general dictionaries, slang dictionaries, surveys and contributions through crowd- sourcing, translation of dictionaries and lexicons from other languages, lexicons of sentiment words and expressions ...
... 2023-10-14 04:19:42 A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian ...
... present an abusive speech lexicon structure and its enrichment with abusive triggers extracted from the AbCoSER dataset. 2012 ACM Subject ClassiĄcation Computing methodologies → Natural language processing Keywords and phrases abusive language, hate speech, Serbian, Twitter, lexicon, corpus Digital Object ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... and phrases that carry positive sentiment polarity. We have used in this research the sentiment lexicon de- veloped for sentiment analysis and described in [24]. The lexicon contains 4,593 entries with sentiment polarity values. Lexicon of irony markers (resource B, Fig. 1) which consists of 62 phrases ...
... morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the rea- soning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity ...
... detect the occurrence of irony is a lexicon of sentiment words and phrases in Serbian (resource C, Fig. 1). Keeping in mind the nature of the rhetorical figure verbal irony which is used to portray a negative statement in the form of a positive one, using the sentiment lexi- con we can detect words and ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... spaCy. All obtained models, together with a rule- and lexicon- based system were evaluated on two sam- ple texts: a part of the gold standard and an independent newspaper text of approx- imately the same size. The results show that rule- and lexicon-based system out- performs trained models in all four ...
... Stanković University of Belgrade Faculty of Mining and Geology Belgrade, Serbia ranka@rgf.bg.ac.rs Abstract In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news- paper texts that was used to prepare a gold standard annotated with personal ...
... NER),2 spaCy (Honnibal and Montani, 2017) (module written in Python, used for advanced NLP)3 and many others. For Serbian, thus far a rule-based and lexicon-based NER system was developed – SRPNER (Krstev et al., 2014). Its development started with the recognition of a NE class present in all NE schemes ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... likely Part-of-Speech tag” and “simply concatenates lemma from a full lexicon, which corresponds to the chosen Part-of-Speech. Hence, word forms with the same Part-of-Speech, but different lemma cannot coexist in the full lexicon.” A new TreeTagger was produced for this research – TT19, based on ...
... difference being the set of resources used for training. Both the train- ing corpus and the lexicon were expanded. Several smaller annotated corpora were added to Intera: 1984, Švejk and Floods, and the lexicon was expanded to over 2.1+ million tokens (including punctuation and other non-alphanumeric ...
... tagging (PoS-tagging). PoS-tagging precedes many other Natural Language Processing tasks, such as Text Classi- fication, Named Entity Recognition, Sentiment Analysis, Question Answering, etc. Computer programs that perform this task, the so-called ‘taggers’, can be based on lookup-tables, regular expres- ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School
Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku... 2014)), lexicog11 – lexicography module (Bosque-Gil, Gracia, and Montiel- 6. Data Catalog Vocabulary (DCAT) - Version 2 7. Lemon - Lexicon Model for Ontologies; Lexicon Model for Ontologies: Com- munity Report, 10 May 2016 8. SKOS Simple Knowledge Organization System - home page 9. Protégé 10. VocBench: ...
... Belgradefor the subjects Knowledge repre- sentation and Semantic web. The Lemon-OntoLex Frac module was used for representation of the entries from the lexicon used for abusive speech detec- tion with attestations from the Twitter corpus with annotation of abusive spans (Jokić et al. 2021). 3 Organization ...
... al Semantic Web Conference, 98–113. Springer. Jokić, Danka, Ranka Stanković, Cvetana Krstev, and Branislava Šandrih. 2021. “A Twitter Corpus and lexicon for abusive speech detection in Serbian.” In Proceedings of the 2021 Language, Data and Knowledge (LDK), 1-3 September in Zaragoza, Spain. McCrae ...Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... in the domain of ecsonomy is presented for Polish. It has two modules: a grammatical lexicon of terminological MWEs and a fully lexicalized shallow grammar, obtained by an automatic con- version of the lexicon. Przepiorkowski and asso- ciates (2007) present results of automatic extraction of term ...
... Heidelberg. Savary, A., Zaborowski, B., Krawczyk-Wieczorek, A. & Makowiecki, F (2012). SEJFEK—a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units. Proc. of Cognitive Aspects of the Lexicon (COGALEX-III). (pp. 195-214). Zhang, Y., Kordoni, V., Villavicencio, A., & Idiart, M. (2006). ...
... ex- traction grammars. Proc. of the 3rd Language & Technology Conference. Quochi, V., Frontini, F., & Rubino, F. (2012). A MWE Acquisition and Lexicon Builder Web Ser- vice. Proc. of COLING 2012 (pp. 2291-2306). Ramisch, C., De Araujo, V., & Villavicencio, A. (2012). A broad evaluation of techniques ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... –http://unitexgramlab.org/ Figure 1: Data categories (markers) dictionary. The main class of the core of the lexicon model is the class LexicalEntry, representing a unit of analysis of the lexicon, which encompasses a set of inflected forms that are grammatically related, and a set of base meanings that ...
... were automatically improved and enriched by intro- ducing new lexical entries and/or lexical relations, and by checking the existing ones. An NLP lexicon has little in common with human-oriented e-dictionary. Data structures in these two types of e- dictionaries are quite different. However, it proved ...
... implemented, neither for lexical database development nor for further processing (Stanković et al., 2013). Finally we considered the lemon model (Lexicon Model for Ontologies), which was derived from LMF, and has been designed for ontology lexicons on the Semantic Web. It is aimed at enriching the ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... format of a TreeTagger full-form lexicon. Each entry of the TreeTagger full-form lexicon contains one-word form and a sequence of tag-lemma pairs that could correspond to that word form (Schmid, 1997). TreeTagger full- form lexicon does not allow the possibility of a lexicon entry with two or more tag-lemma ...
... have homograph word forms (tati, tatom, tate, tatu, tata) causing that lexicon entries with these forms cannot contain both tag-lemma pairs (N, tat) and (N, tata) where N is PoS tag denoting noun. Thus, creator of full-form lexicon has to choose which tag-lemma pair will keep and the choice is commonly ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
-
Football terminology: compilation and transformation into OntoLex-Lemon resource
У овом раду представља се пројекат који је у развоју, креирање првог дигиталног фудбалског речника на српском језику, као и да демонстрација примене модела OntoLex и љегових модула. OntoLex-FrAC модул укључује информације о учесталости и примерима употребе екстрахованих из корпуса. У овом случају, креиран је корпус за специфичан домен под називом СрФудКо, који садржи чланке вести о фудбалу на српском језику. Вишечлани термини аутоматски су екстраховани из српског корпуса, а затим ручно евалуирани и класификовани као спортски или ...Jelena Lazarević, Ranka Stanković, Mihailo Škorić, Biljana Rujević. "Football terminology: compilation and transformation into OntoLex-Lemon resource" in LDK 2023 – 4th Conference on Language, Data and Knowledge, 12-15 September in Vienna, Austria, Lisabon : NOVA FCSH - CLUNL (2023). https://doi.org/10.34619/srmk-injj
-
Fourth Summer Datathon on Linguistic Linked Open Data
Tijana Radović, Ranka Stanković (2023)The 4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22) was held in Spain, in Cersedilla near Madrid, in May 2022, and organized by the COST Action NexusLinguarum. The school gathered interested researchers, academics, students who wanted to acquire and/or expand their knowledge in the field of linguistic linked data science. During the school, a spectrum of topics from the field of linked data was presented, from various ontologies, through document integration, annotation and natural language text processing tools ...Tijana Radović, Ranka Stanković. "Fourth Summer Datathon on Linguistic Linked Open Data" in Infotheca, Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.6
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... negative) used for each SWN synset. A sentiment lexicon is produced using word forms defined in SWN that have positive or negative sentiment scores. This kind of a lexicon is applied in sentiment polarity classification tasks on Serbian texts, achieving 97.1% accuracy over cross-validated datasets ...
... execution, member states, trafficking). SENTIMENT ANALYSIS AS A NEXT STEP IN A STUDY OF FORENSIC TEXTS The semantic network Serbian WordNet (SWN) is a lexico-semantic resource that has been developed based on the idea of the Princeton WordNet (PWN), a mental lexicon that helps scientists working on ...
... concludes with Sentiment Analysis as a next step in a study of forensic texts. REFERENCES 1. Cvetana Krstev. Processing of Serbian – Automata, Text and Electronic Dictionaries, Faculty of philology, Belgrade, 2008. 2. Cvetana Krstev, Duško Vitas, “Corpus and Lexicon - Mutual Incompletness ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... Savary, A., Zaborowski, B., Krawczyk-Wieczorek A., and Makowiecki F. (2012). SEJFEK — a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units. In Proc. of the 3rd Workshop on Cognitive Aspects of the Lexicon (CogALex-III), COLING 2012, Mumbai: COLING, pp. 195--214. Schone, P., Jurafsky, ...
... technical corpus, using a cascade of transducers (Ammar et al., 2015). Another example of this approach, SEJFEK, consisting of a grammatical lexicon of about 11,000 Polish MWTs from the economical domain, where inflectional and syntactic variations are described via graph-based rules, is described ...
... P., Hindle, D. (1991). Using statistics in lexical analysis, In U. Zernik (Ed.), Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 115--164. Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... linked to the Princeton Wordnet. The latter re- source, SIMPLE, constitutes the semantic level of a quadri- partite Italian lexicon. Its structure is inspired by Gener- ative Lexicon theory (Pustejovsky, 1995) and in particular the notion of qualia structure which is used to organise the Semantic Units ...
... dictio- nary made it possible to combine verb groups and dictio- nary valency information, used as input for the compilation of the Danish FrameNet Lexicon (Nimb, 2018). Further- more, they constitute the basis for the automatically inte- grated information on related words in DDO, on the fly for each ...
... Proceedings of the XVIII EU- RALEX International Congress: Lexicography in Global Contexts, pages 915–923. Nimb, S. (2018). The Danish FrameNet lexicon: method and lexical coverage. In Proceedings of the Interna- tional FrameNet Workshop at LREC 2018: Multilingual FrameNets and Constructions, pages ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... information is in the MorfPat- tern table, while the information about the dictionary to which the lexical entry belongs is in the Lexicon table. For one entry in the Lexicon table, that is one dictionary, one or more records of the LexicalEntry table are connected. This means that one or more lexical entries ...
... Lemon model is concise, descriptive, modular and RDF based. At the time of making Leximirka database, Lemon model consisted of five modules: Ontology-lexicon interface – ontolex, Syntax and Semantics – synsem, De- composition – decomp, Variation and Translation – vartrans and Linguistic Metadata – lime. ...
... ”+PR”, toponyme ”+Toponyme” and city ”+Ville”. The name of the dictionary that contains this lexical entry ”Prolex-Unitex.dic“ is stored in the table Lexicon. These tables are shown in the Figure 1. The form ”Paris” is the same for singular and both male and female gender and it is stored in the Form table ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Geologic Information System of Serbia
Geologic information system of Serbia (GeolISS) represents repository for digital archiving, query, retrieving, analysis and geologic data visualization. The GeolISS is implemented through ESRI ArcGIS technology, and is designed to operate as a personal geodatabase (MS Jet 4.0 Engine) and SDE enterprise geodatabase in MS SQL Server. The objective of GeolISS implementation is integration of existing geologic archives, data from published maps at different scales, newly acquired field data, as well as Web publishing of geologic information. Physical implementation ...... that is implemented as compilation of geologic vocabularies such as petrologic and mineralogic classification, geologic time scale, stratigraphic lexicon etc. The terms in the vocabularies are used to classify observations/interpretations, or to specify attribute values. Observations implement field ...Branislav Blagojević, Branislav Trivić, Ranka Stanković, Nenad Banjac, Olivera Kitanović. "Geologic Information System of Serbia" in Proceedings of the 17th Meeting of the Association of European Geological Societies, 14.-18. september 2011., Beograd : Srpsko geološko društvo (2011)