Претрага
119 items
-
An Approach to Development of Bilingual Lexical Resources
... available lexical resources. Terms electronic learning and e-learning and their Serbian translational equivalents elektronsko učenje and e-učenje do not exist in either of the resources. Hence the English synset {electronic learning, e-learning} and its Serbian counterpart {elektronsko učenje ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... advantages. First of all, the classification can be done in several steps. Traditional machine learning can also be used at every step, which in the case of a smaller number of class examples gives better results than deep learning [31]. Another advantage is a simpler structure of the annotation decision tree ...
... mark it as non-abusive ([27, 6]). This annotation approach was chosen to facilitate automatic detection of abusive speech by a system based on machine learning techniques. There are also cases when negation LDK 2021 13:14 Building Language Resources for Abusive Language Detection in Serbian Listing ...
... results on similar data sets for other languages ([25, 44, 47]). The focus of our current research is the usage of a hybrid approach that combines machine learning and lexical resources. Finally, a user-friendly interface that will enable the use of these resources on the Web is under development. As for ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... for Serbian - The Case of Personal Names | Branislava Šandrih, Cvetana Krstev, Ranka Stanković | Proceedings - Natural Language Processing in a Deep Learning World | 2019 | | 10.26615/978-954-452-056-4_122 http://dr.rgf.bg.ac.rs/s/repo/item/0005243 Дигитални репозиторијум Рударско-геолошког факултета ...
... types. There are three categories of NER systems: 1) The rule-based (RB) (Krupka and Hausman, 1998; Friburger and Maurel, 2004); 2) the Ma- chine Learning (ML) based (Finkel and Manning, 2009; Singh et al., 2010); and 3) hybrid meth- ods (Jansche and Abney, 2002). The ML-based methods can often be “black ...
... used to produce the corpus of newspaper texts an- notated with personal names – the gold standard. Section 3 describes NER systems based on Ma- chine Learning methods that were trained on the corpus derived from the gold standard, while the evaluation and discussion of results are presented in Sections 4 ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Towards a Mining Equipment Ontology
... Obradović, I., Kitanović, O., Prodanović, J., Ilić, V.: An aproach to implementation of blended learning in a university setting. In Proceedings of the Second International Conference on e- Learning, eLearning 2011, Belgrade, Serbia, D. Milošević (ed.), Metropolitan University, pp.61-66, 2011. ...
... management of exploitation, mine safety or mining equipment management. Such terminological resource have successfully been generated for the FMG e-learning platform Moodle [11]. A RudOnto ontology could thus ultimately serve as a tool for generating a mining equipment ontology. In this paper we explain ...
... generating specialized terminological resources. As we have already mentioned it has been used in practice for generating terminology for the FMG e-learning platform Moodle, and we will describe here how it can be used for production of OWL ontologies. The transformation is performed by a wizard, which ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Towards a Mining Equipment Ontology" in Proceedings of the 12th International Conference Research and Development in Mechanical Industry, RaDMI 2012, September 2012, Vrnjačka Banja, Serbia no. 1, Vrnjačka Banja, Serbia : SaTCIP (Scientific and Technical Center for Intellectual Property) Ltd. (2012)
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Development of terminological resources for expert knowledge: a case study in mining
Ljiljana Kolonja, Ranka Stanković, Ivan Obradović, Olivera Kitanović, Aleksandar Cvjetić. "Development of terminological resources for expert knowledge: a case study in mining" in Knowledge Management Research & Practice, Palgrave Macmillan (2015). https://doi.org/10.1057/kmrp.2015.10
-
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... to develop a manually annotated corpus that would be used as a training dataset for supervised machine learning systems. An automatic semantic role labelling experiment, based on supervised machine learning is also described in the paper. The most frequent verbs, semantic roles and typical semantic-syntactic ...
... 2018, Miyazaki, Japan, May 7-12, 2018, 48–56. Stanković, Ranka, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, and Mihailo Skoric. 2020. “Machine Learning and Deep Neural Network- Based Lemmatization and Morphosyntactic Tagging for Serbian.” In Proceedings of The 12th LREC – Language Resources and ...
... scrolled through online, but also downloaded and used locally. As the website states, it can be used for different purposes: as a dictionary for language learning (since it contains more than 13,000 LUs); as a valence dictionary; as a training dataset for semantic role labeling14 which makes it a rich digital ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
-
A comparison between ARIMA, LSTM, ARIMA-LSTM and SSA for cross-border rail freight traffic forecasting: the case of Alpine-Western Balkan Rail Freight Corridor
Miloš Milenković, Miloš Gligorić, Nebojša Bojović, Zoran Gligorić. "A comparison between ARIMA, LSTM, ARIMA-LSTM and SSA for cross-border rail freight traffic forecasting: the case of Alpine-Western Balkan Rail Freight Corridor" in Transportation Planning and Technology, Informa UK Limited (2023). https://doi.org/10.1080/03081060.2023.2245389
-
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... sentiment lexicon. They took words with negative polarity as a baseline for creating a basic lexicon of 551 words, which was further enriched via machine learning into a lexicon of 2898 abusive words. Several authors used the Wiegand lexicon as a blacklist in their hate speech and abusive language detection ...
... with a special focus given to MWEs, but there is still much to be done. Options of using a hybrid approach that would merge a dictionary with machine-learning will be explored. Finally, a user-friendly interface that will enable the usage of these resources on the Web is under development. We plan to ...
... Declerck, J. Gracia and B. Klimek, pages 48–56. Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, and Mihailo Škorić. 2020. Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. In Proceedings of The 12th Language Resources and Evaluation ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... Intelligence, Granada, Spain, 4–6 November 2015; Volume 1495, pp. 81–89. 35. Stankovic, R.; Šandrih, B.; Krstev, C.; Utvić, M.; Škorić, M. Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. In Proceedings of The 12th Language Resources and Evaluation ...
... side upside down [5]. Wide adoption of mobile devices has created new ways of learning through interaction and communication and they are becoming integrated in the lives of today’s students, enhancing mobility of the learning process. Thus, for example, Language for Specific Purposes (LSP) dictionaries ...
... dictionary called MobiLex was produced at the Stellenbosch University in South Africa to enhance teaching and learning of historical terms, with favorable pedagogical consequences regarding the learning of such terms. Trends and developments in technology offer the possibility of changing the face of lexi ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... s based on machine learning techniques have not encountered them in training corpora, while those based on lexical resources do not have them listed in lex- icons. However, their adequate treatment is crucial for many applications, e.g. text-to-speech systems (Taylor, 2009), machine translation ...
... sometimes performed manually (Tsimpouris et al., 2014), by using some heuristics (Yeates, 1999; Schwartz and Hearst, 2003; Wren et al., 2002) or machine-learning methods (Jacobs et al., 2014). A window in which defini- tions of acronyms are looked for is usually narrow – defini- tions appear in local ...
... non-local expansions of acronyms (they need not appear in same documents as acronyms). The third task can be tackled by using super- vised machine-learning techniques in order to assign the appropriate sense to ambiguous acronyms and abbrevia- tions (Moon et al., 2012). Authors in (Ranchhod et al ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... tasks of text analytics. The mainstream technology of automatic document categorization is the machine-learning approach. This approach assumes that there is a sufficient training collection for learning the algorithms. However, many organizations have a need in automatic text categorization, when ...
... subject headings) may be absent and should be created from scratch or with the use of existing similar categorial systems. In such conditions, machine-learning approaches cannot be applied, and knowledge-based methods of text categorization, i.e. exploiting manual rules for describing categories, are ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
Simulation of Hydrogeological Environmental Discharge in Case of Interruption Constant Observations
Marina Čokorilo Ilić, Dragoljub Bajić, Miroslav Popović. "Simulation of Hydrogeological Environmental Discharge in Case of Interruption Constant Observations" in International Scientific Conference - Sinteza 2024, Belgrade, 16. maj 2024, Singidunum University (2024). https://doi.org/10.15308/Sinteza-2024-288-294
-
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... for improvement are given in Section 6. 2 RELATEDWORK In automatic computational irony detection supervised machine learning and pattern matching techniques are used equally. Ma- chine learning looks at a problem of verbal irony detection as a binary classification problem with classes of ironic and ...
... studies, given by the Inkpot group [11] within the Rhetfig project,15 which combines linguistic and rhetorical theories with discourse analysis and machine learning to develop formal models of computational rhetoric, one definition of sarcasm is “Use of mockery, verbal taunts, or bitter irony”. For this reason ...
... words in a tweet (OSA) and irony markers (described in Section 3) (M). We used these features to train a MaxEnt classifier, a supervised ma- chine learning algorithm which we implemented using MaxEnt SharpEntropy library14 on 5-folded cross-validated dataset. The classification results according to the ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
-
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...... Task. In Proceedings of the NTCIR-15 Conference. Ranka Stanković, Branislava Šandrih, Cvetana Kr- stev, Miloš Utvić, and Mihailo Škorić. 2020. Machine Learning and Deep Neural Network- Based Lemmatization and Morphosyntactic Tag- ging for Serbian. In Proceedings of the 12th Lan- guage Resources and Evaluation ...
... Named Entity Translation in NMT. In Proceedings of the 22nd Annual Conference of the European Association for Machine Transla- tion, pages 45–51, Lisboa, Portugal. European Association for Machine Translation. Eleni Partalidou, Eleftherios Spyromitros-Xioufis, Stavros Doropoulos, Stavros Vologiannidis ...
... Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić | Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications | 2021 | | 10.26615/978-954-452-072-4_141 http://dr.rgf.bg.ac.rs/s/repo/item/0005139 Дигитални ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
-
Хазард од клизишта у Србији у 21. веку
Biljana Abolmasov (2019)... 74: 91�100. [23] Marjanovi�, M., Kova¥evi�, M., Bajat B., Mihali�, S., Abolmasov, B. (2011). L Assessment of Star¥a Basin (Croatia) Using Machine Learning Algor Geotechnica Slovenica 2011 (2): 45�55. [24] ������ �� �. (2017). ������ ������� ›���� �� ������� ������� �� ›����� �� ������ ����� ������ ...
... Geologiques de la Peninsule Balkanique Marjanovi , M., Kova¥evi , M., Bajat B., Mihali , S., Abolmasov, B. Assessment of Star¥a Basin (Croatia) Using Machine Learning Algo: Geotechnica Slovenica 2011 (2): 45 55. (2017). > > . 3: 21 34. Gariano SL, Guzzetti F. (2016). Landslides in a changing climate Reviews ...
... 5000, �������� 100 km2). ���� ����� �� ������ �� � ���� AHP ������ [18], ��› �� ����� ������ �� � ���� ������ ��������� � ����� (Support Vector Machine � SVM) [21]. �� ���������� ������ �������� � ›��� �� ������ ������ ���›�� �� ������, ›��� � ������������ � ������������ ��� ����� ���� �� ��� ...Biljana Abolmasov. "Хазард од клизишта у Србији у 21. веку" in Геохазард у Србији у 21. веку – знање је најбољи бедем против стихије, Српска академија наука и уметности (2019)
-
A Model for Determining Fuzzy Evaluations of Partial Indicators of Availability for High-Capacity Continuous Systems at Coal Open Pits Using a Neuro-Fuzzy Inference System
This paper presents a model for determining fuzzy evaluations of partial indicators of the availability of continuous systems at coal open pits using a neuro-fuzzy inference system. The system itself is a combination of fuzzy logic and artificial neural networks. The system availability is divided into partial indicators. By combining the fuzzy logic and artificial neural networks, a model is obtained that has the ability to learn and uses expert judgment for that learning. This paper deals with the ...системи, континуални системи експлоатације (роторни багер-транспортер-дробилично постројење), рударство, расположивост, меко рачунарство, фази логика, ANN, ANFISMiljan Gomilanović, Miloš Tanasijević, Saša Stepanović, Filip Miletić. "A Model for Determining Fuzzy Evaluations of Partial Indicators of Availability for High-Capacity Continuous Systems at Coal Open Pits Using a Neuro-Fuzzy Inference System" in Energies, MDPI AG (2023). https://doi.org/10.3390/en16072958
-
Multihazard Exposure Assessment on the Valjevo City Road Network
Miloš Marjanović, Biljana Abolmasov, Svetozar Milenković, Uroš Đurić, Jelka Krušić, Mileva Samardžić Petrović (2019)Miloš Marjanović, Biljana Abolmasov, Svetozar Milenković, Uroš Đurić, Jelka Krušić, Mileva Samardžić Petrović. "Multihazard Exposure Assessment on the Valjevo City Road Network" in Spatial Modeling in GIS and R for Earth and Environmental Sciences, Elsevier Inc (2019). https://doi.org/10.1016/B978-0-12-815226-3.00031-4.
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., and Dubourg, V. (2011). Scikit-learn: Ma- chine Learning in Python. Journal of Machine Learning Research, 12(Oct):2825–2830. Repar, A. and Pollak, S. (2017). Good Examples for Ter- minology Databases in Translation Industry ...
... score (F1). After several different classifiers evaluation, Gradient Boost model (Friedman, 2001) implementation from the scikit-learn toolkit for Machine Learning for Python (Pedregosa et al., 2011) turned out to have the best performance on this dataset. Gradient boosting is an iter- ative technique that ...
... (2001). Greedy Function Approximation: a Gradient Boosting Machine. Annals of statistics, pages 1189–1232. Heafield, K. (2011). KenLM: Faster and Smaller Lan- guage Model Queries. In Proceedings of the Sixth Work- shop on Statistical Machine Translation, pages 187–197. Association for Computational ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... Knowledge-free induction of morphology using latent semantic analysis. In Proc. of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -Volume 7, Stroudsburg: Association for Computational Linguistics, pp. 67--72. Smadja, F.(1993). Retrieving ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)