Претрага
66 items
-
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface
Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos (2024)Predstavljamo trenutne aktivnosti na definisanju interfejsa leksikona i korpusa koji će služiti kao referenca u prikazu polileksemskih jedinica - višečlanih izraza - (različitih tipova - imenskih, glagolskih, itd.) u specijalizovanim leksikonima i povezivanju ovih unosa sa njihovim pojavljivanjima u korpusima. Konačni cilj je korišćenje ovakvih resursa za automatsko identifikovanje višečlanih izraza u tekstu. Uključivanje nekoliko prirodnih jezika ima za cilj univerzalnost rešenja koje nije usredsređeno na određeni jezik, kao i prilagođavanje idiosinkrazijama. Raspravljaju se izazovi u leksikografskom opisu višerečnih ...Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos. "Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...LR web services, MultiWord Expressions & Collocations, Information Extraction, Information RetrievalKrstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
-
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... Antonio Moreno-Ortiz, Chantal Pérez-Hernández, and Maria Del-Olmo. 2013. Managing multiword expressions in a lexicon-based sentiment analysis system for spanish. In Proceedings of the 9th Workshop on Multiword Expressions, pages 1–10. Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and ...
... publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 74–84 Barcelona, Spain (Online), December 13, 2020. 74 Multi-word Expressions for Abusive Speech Detection in Serbian Ranka Stanković University of Belgrade ranka@rgf ...
... Serbian | Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev | Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons | 2020 | | http://dr.rgf.bg.ac.rs/s/repo/item/0005015 Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду омогућава приступ ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
-
Development of Open Educational Resources (OER) for Natural Language Processing
In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...... queries in the form of regular expressions whose basic elements are either word forms (strings) or lexical masks that refer to the content of e-dictionaries. 5. The advanced methods of text searching are introduced: morphological filters – regular expressions that enable string search at the ...
... many valuable resources for Serbian were already developed. This course covers a broad range of topics such as pattern recognition using regular expressions, electronic dictionaries, Finite-state automata and transducers, etc. Within the course different didactic forms were used including text, video ...
... answering, text summarization, collocations and information retrieval, sentiment analysis and semantics, discourse, machine translation, regular expressions, language models, text classification, and name entity recognition. All of them combine textual and video lectures with quizzes and assignments ...Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
The Nooj System as Module within an Integrated Language Processing Environment
... more precisely, that contain the XML tagwith the content “administration”. WS4LR offers predefined Xpath expressions, but the user can easily define his/her own expressions. Once the user has retrieved the synsets of interest from the wordnet, he/she can now proceed to their modification ...
... different languages with a single query is opened. NooJ supports morphological query expansion and expansion of queries by graphs and regular expressions. The integration of the morphological power of NooJ with the semantic and multilingual power of wordnets may best be illustrated by concordance ...
... can be retrieved from wordnets into the two available wordnet windows using various methods, from simple string matching to complex Xpath expressions. The user can, for example, specify one or two strings, depending on whether he/she wants to search one or both wordnets for synsets containing ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... Instead of universal NooJ expressions, which turned out to be unfeasible, mutually corresponding NooJ expressions were produced on basis of semantic codes discovered in the previous step through the analysis of annotations (text dictionary). These expressions were subsequently applied to ...
... stored in the system of electronic dictionaries (Vitas et al. 2008), in the next step of the analysis, an attempt was made to construct NooJ expressions that could extract the same concepts in all languages. However, that turned out to be a rather difficult task, due to the fact that semantic ...
... 411 Hum +Hum 250 +Hum 2706 +Hum 414 +Hum 3661 HumColl +HumColl 20 +HumColl 139 +CollHum 94 Table 5. NooJ expressions used with noun patterns Another experiment along the same lines was performed with NooJ graphs. In order to analyze the results of the application ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... with regular expressions and inflectional and morphological finite state transducers (FSTs) to locate morphological, lexical and syntactic patterns, remove ambiguities, and tag simple and compound words in texts. The text parsing possibilities offered by regular expressions and FSTs proved ...
... geology, or more precisely, that contain the XML tagwith the content “geology”. WS4LR offers predefined Xpath expressions, but the user can also define these expressions him/herself. Once the user has retrieved the synsets of interest from the wordnet, he/she can now proceed to their ...
... Since all three systems (Intex, Unitex, and NooJ) provide for processing of texts on basis of dictionaries, in combination with regular expressions and FSTs, and each of them has some useful specific features, the dictionary management module allows the user to activate the functions of ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... developed several years ago. It has been designed to recognize the main classes of NEs: 1) numerical expressions (measurement and money), 2) temporal expressions (date and time, and 3) name expressions (personal, geopo- litical and organization names). The system was designed in a form of the cas- cades ...
... modeling, etc. The first Named Entity set had 7 types (Grishman and Sundheim, 1996): organization, location, person, date, time, money and percent expressions. Sekine et al. (2002) proposed a NE hierarchy which contains about 150 NE types. There are three categories of NER systems: 1) The rule-based (RB) ...
... transducers and on e- dictionaries of Serbian (Vitas and Krstev, 2012). E-dictionaries play an important role specifically in the recognition of name expressions, since, beside general lexica, they contain many proper names, both personal and geopolitical. The system is modular which means that steps can be ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Managing mining project documentation using human language technology
Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentationAleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... that they wiil not be mistaken for :/ emoticon. Finally, regular expressions [a|h|A|H][a|h|A|H][h|H][a|A][h|H][a|h|A|H] and [h|H][a|A][h|H][a|A] are used to find as many different examples of expres- sion of laughter, and the expressions found are replaced with hahaha and haha respectively. If this option ...
... found. After each of the found expressions string, which marks the beginning of the token, is added. – Then the end of each message is found, by searching the string – which is the ending tag of each message node. After each of the found expressions string is added, to get ...
... phrases that appear in the conversation, which by nature are not of universal meaning and reflect a positive or negative attitude, replacing facial expressions and/or intonation in the written text. The intensity values of a determiner directly affects the intensity value which he transfer onto the term ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
Development and evaluation of fracture gradient curve model: a case study of south-central Kazakhstan
An accurate fracture gradient value is one of the most important issues in oil and gas well design and drilling. The value of a fracture gradient is a critical parameter for determining the drilling mud weight and selecting the proper depths for setting the casing in the planning process of drilling operations. This paper proposed the new curve model of fracture gradient for the south-central Kazakhstan (central Asia) region based on the analysis of leak-off test and format integrity ...Branimir Stanisavljevic, Vesna Karovic Maricic, Irena Isakov. "Development and evaluation of fracture gradient curve model: a case study of south-central Kazakhstan" in International Journal of Oil, Gas and Coal Technology, Inderscience Publishers (2022). https://doi.org/10.1504/IJOGCT.2023.128042
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... 5 (2017): 763–788 Baldwin, Timothy and Su Nam Kim. “Multiword Expressions”. Handbook of Natural Language Processing Vol. 2 (2010): 267–292 Bouamor, Dhouha, Nasredine Semmar and Pierre Zweigenbaum. “Identi- fying Bilingual Multi-Word Expressions for Statistical Machine Trans- lation”. In Proceedings ...
... Engineering (TKE 2012), June, 20–21. 2012 Princeton WordNet, 2010 Semmar, Nasredine. “A Hybrid Approach for Automatic Extraction of Bilin- gual Multiword Expressions from Parallel Corpora”. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), chair), Nicoletta ...
... European Language Resources Association (ELRA), 2012 Constant, Mathieu, Gülşen Eryiğit, Johanna Monti, Lonneke Van Der Plas, Carlos Ramisch et al. “Multiword Expression Processing: A Survey”. Com- putational Linguistics Vol. 43, no. 4 (2017): 837–892 Cram, D. and B. Daille. “Terminology Extraction with ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... arXiv:1603.07709. 42 Ranka Stanković, Jelena Mitrović, Danka Jokić, and Cvetana Krstev. Multi-word Expressions for Abusive Speech Detection in Serbian. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 74–84, 2020. 43 Julien Tissier, Christophe Gravier, and Amaury ...
... and then the site or social network moderators manually review the report. More advanced platforms use systems with regular expressions and “black” lists of words and expressions, to catch abusive language and remove posts [25]. There are also online portals such as HateBase.org that collect examples ...
... of abusive expressions will also take into consideration phrases and figurative speech as indicators. In addition to the improved version of Hurtlex [2], resources that can be useful for the creation of a lexicon of offensive words are lists of swear words, curses, abusive expressions, existing general ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... for супротан ‘antonym’); 3. related words; 6. examples: 1. the text of the example 2. the bibliographic reference (in parenthesis); 5) multiword expressions (syntagmatic and phraseological – they are listed in the separate para- graph beginning with the abbreviation Изр. for Израз ‘phrase’; and ...
... synonyms after син. antonyms after ; супр. referring to after вар./исп. — examples references expressions (vernicular) proverbs lingustic покр./дијал./арх./ antonyms after супр. вар. исп.исп. expressions анат./бот./вој./ист./ 1 2 3 4.1 4.2 4.3 4.4 4.5 4.6 5 6 4 Figure 1: The microstructure ...
... standard Serbo-Cro- atian language from the beginning of the 19th century to the present day, as well as about 300 word collections (provincial expressions, dialectical variations, etc.) of all Shtokavian dialects. The paper version of the dictionary has a complex microstructure that was designed at ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
Punctuated, episodic magmatism and mineralization of the Rogozna skarn-hosted Au-Zn-Pb-Cu deposits revealed through high-precision U-Pb zircon geochronology
The subvolcanic regions of magmatic centers are commonly associated with alteration, mineralization, and economic ore deposits, however the duration and frequency of mineralizing pulses within the overall lifespan of these centers can be poorly defined. Therefore, models for the formation of mineral systems require more highprecision geochronology data to refine their evolutionary models. Rogozna Mountain and its eponymous magmatic complex, located in SW Serbia, hosts multiple base metal deposits associated with variable rock types and structural expressions and serves as a natural laboratory to ...Skarnovska mineralizacija, Magmatsko-hidrothermalni sistemi, Subvulcanske intruzije, geohronologija na cirkonima, CA-ID-TIMSSean P. Gaynor, Milorad D. Antić, Vladica Cvetković, Kristina Šarić, Urs Schaltegger. "Punctuated, episodic magmatism and mineralization of the Rogozna skarn-hosted Au-Zn-Pb-Cu deposits revealed through high-precision U-Pb zircon geochronology" in Ore Geology Reviews, Elsevier (2023). https://doi.org/https://doi.org/10.1016/j.oregeorev.2023.105775
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... years) is derived from the basic model, which is based on the input of keywords, single or multi-word units (MWU), that can be combined into Boolean expressions. In addition to general search which goes through all fields in the relevant tables, the search can also be performed using specific criteria. For ...
... Entity Recognition. According to [13] the term “Named Entity” (NE) usually refers to names of persons, locations and organizations, and numeric expressions including, time, date, money and percentage. Recently other major types are being included, like “products” and “events”, but also marginal ones ...
... hierarchy in our Named Entity Recognition (NER) system consists of five top-level types: persons, organizations, locations, amounts, and temporal expressions, each of them having one or more levels of sub-types. Our tagging strategy allows nesting, which means that a named entity can be nested within another ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
-
A Mathematical Learning Environment Based on Serbian Language Resources
In recent years, in line with ever growing usage of Information technology, the learning environments are changing. The amount of available learning materials in various forms has increased. These new environments demand comprehensive learning systems, which enable management of the learning corpus with special attention paid to relevant lexical resources. In this paper we present the concept of a Mathematical Learning Environment in Serbian (MLES), which is based on a corpus of mathematical materials and various lexical resources, enabling ...... ekstremum.N:ms6q IT Education and Practice Radojičić et al. 251 A large number of terms in mathematics, as in other domains, are multiword expressions (MWE). Thus a procedure described in [12] has been used for semi-automatic extraction of MWEs on basis of lexical resources and local ...
... processing mathematical formulae, the main problem being different notation depending on the context. For instance there can exist different expressions for the same mathematical content, with the same meaning such as: 1𝑥 = 1 𝑥⁄ = 1 𝑥⁄ = 𝑥−1 On the other hand, an expression can represent ...
... augmented annotation and search. During the processing of mathematical formulae, augmented annotation can be realized, which can cover different expressions of the same formula. 4. CORPUS PROCESSING RESULTS Mathematical terminology in Serbian is unsatisfactorily represented in terminological resources ...Radojičić Marija, Obradović Ivan, Stanković Ranka, Utvić Miloć, Kaplar Sebastijan. "A Mathematical Learning Environment Based on Serbian Language Resources" in Proceedings of the 7th International Scientific Conference Technics and Informatics in Education, Faculty of Technical Sciences, Čačak (2018)
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... //www.meta-net.eu/whitepapers. Baldwin, T. and Kim, S. N. (2010). Multiword expres- sions. Handbook of natural language processing, 2:267– 292. Bouamor, D., Semmar, N., and Zweigenbaum, P. (2012). Identifying bilingual multi-word expressions for statisti- cal machine translation. In Nicoletta Calzolari ...
... of the University of Belgrade. Moirón, B. V. and Tiedemann, J. (2006). Identifying id- iomatic expressions using automatic word-alignment. In Proceedings of the EACL 2006 Workshop on Multi-word expressions in a multilingual context, pages 33–40. Och, F. J. and Ney, H. (2003). A Systematic Comparison ...
... corrobo- rate this claim).1 3. A large portion of MWT terms in Serbian has a limited number of syntactic structures. Namely, 98% of all 1Multiword expressions (MWE) are lexical units composed of more than one word, which are syntactically, semantically, prag- matically, and/or statistically idiosyncratic ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)