Претрага
108 items
-
Preparation of Multimedia Document “YU Rock Scene”
SUMMARY: This study will present the preparation process of a multimedia document entitled YU ROCK SCENE in which participants were senior students of undergraduate studies of the Department of Library and Information Science at the University of Belgrade Faculty of Philology during the academic year 2014/2015, as a part of the subject Multimedia Documents. This study gives an overview of the historical development of rock and roll in the territory of the former Yugoslavia, rock scene in Yugoslav republics, ...... Измењено: 2023-10-14 04:07:58 Preparation of Multimedia Document “YU Rock Scene” Milena Obradović, Aleksandra Arsenijević, Mihailo Škorić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Preparation of Multimedia Document “YU Rock Scene” | Milena Obradović, Aleksandra ...
... www.dr.rgf.bg.ac.rs Professional paper Preparation of Multimedia Document “YU Rock Scene” UDC 004.55:378.147]:02(497.11) DOI 10.18485/infotheca.2016.16.1_2.6 ABSTRACT: This study presents the prepara- tion process of the multimedia document entitled “YU ROCK SCENE” in which participants were senior ...
... technical implementation of the project, including stages such as planning, processing, designing and creation of the multime- dia document itself. KEYWORDS: multimedia document, library science, information science, rock and roll, music, Yugoslavia PAPER SUBMITTED: 28 March 2016 PAPER ACCEPTED: 06 May ...Milena Obradović, Aleksandra Arsenijević, Mihailo Škorić. "Preparation of Multimedia Document “YU Rock Scene”" in Infotheca - Journal for Digital Humanities, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2016.16.1_2.6
-
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... Infotheca Vol. 19, No. 1, September 2019 57 Škorić M., Dragoni M., “Medical document classification...”, pp. 55–69 with the largest sim(document, document class) value, i.e. the class most similar to the document that is the subject of the classification.7 The problem that arises in calculating ...
... 3. Noise removal. This stage should enable and provide better results for the document classification. 4. Document classification based on their identifier vectors and a simple set of rules. 5. Evaluation of document classification performance for each of the sets used. This stage allows us to reflect ...
... a new document: it begins with the title of the document (without extension) followed by a semicolon, and all the concept identifiers found in it separated by commas (Figure 8). We will illustrate the transformation of one of the starting documents by steps on a simple example. In a document fragment ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... representation of the document content is generated, formally referred to as the document surrogate, with the aim of increasing document retrieval efficiency, through a better matching of user needs and retrieval results. Document surrogates typically consist of metadata about the document, such as title, ...
... dl is the average document length, k1 = 1.2, k2 = 0.75 length normalisation; 5. Creating a dictionary of the whole document collection from all words selected in Step 4. For each term Tk in the document collection, k = 1, . . . M , where M is the size of the dictionary of document collection: (a) c ...
... shows that the system found 4 results, and appropriate document snippets are displayed. For each document A u t h o r P r o o f 6 R. Stanković et al. there is also a link enabling the user to look into further details on the retrieved document. The user can chose the maximum number of retrieved ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution
This paper explores the effectiveness of parallel stylometric document embeddings in solving the authorship attribution task by testing a novel approach on literary texts in 7 different languages, totaling in 7051 unique 10,000-token chunks from 700 PoS and lemma annotated documents. We used these documents to produce four document embedding models using Stylo R package (word-based, lemma-based, PoS-trigrams-based, and PoS-mask-based) and one document embedding model using mBERT for each of the seven languages. We created further derivations of these ...Mihailo Škorić, Ranka Stanković, Milica Ikonić Nešić, Joanna Byszuk, Maciej Eder. "Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution" in Mathematics, MDPI AG (2022). https://doi.org/10.3390/math10050838
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... the similarity between the query and the document is ranked based on the sum of weights for all words in the query. Document in Figure 1 is about a project that deals with gold. When search- ing with the keyword zlato ‘gold’ the old system ranks the document as 125th with general search and as 84th ...
... related to Information Retrieval (IR) are the presentation of document content, the presentation of information needs and the comparison of these two representations. Presentation of documents as a rule contains meta- data about the document, such as title, abstract, location or, in case of indexing of ...
... primary key of the document in the database. If the search is performed by scanning textual documents, then their additional representation is not required. In order to increase efficiency, es- pecially in the case of large collections, a formal representation surrogate of each document is usually formed ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
The Nooj System as Module within an Integrated Language Processing Environment
... English query ‘document‘, is expanded by English wordnet to ‘document, written document, papers’ which is then transformed by ILI to a set of Serbian keywords: ‘dokument, papir, akt’ that can be further expanded to all inflectional form by NooJ inflectional graphs: ‘document, dokumenta, dokumentu ...
... of techniques in which a query serving as input to a document retriever is evolved in some way with the intent to improve the document retriever's performance, according to some metric. Query expansion is particularly applicable to document retrieval components that provide a Boolean query model ...
... tasks such as query expansion. By this we mean the techniques in which a query, serving as input to a document retriever, is transformed in some way in order to improve the performance of document retrieval. Namely, a search by a concept instead of a search by a single word form is recognized as ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... y becomes the first node in the document and the procedure starts again. This procedure assigns attributes to all the tokens except for the last one, ERR0001 token, which was created during tokenization of the document and is used for marking the end of the document, and for calculating the attribute ...
... replaces them with an empty string. To further ensure that the document is well-formed character & is replaced with a whitespace and a new root element is introduced. Preprocessing is finished with an XSL transformation that transform a document into one that can be additionally processed (Figure 4). All ...
... attribute, each emotext value string is replaced with emo- text_value, and then, each whitespace in the document is replaced withstring. Character _ becomes whitesapce again and document structure becomes: . The second ...... ... Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... text of a specific document, he/she can obtain it by clicking on the “metadata” link shown in the leftmost column in Figure 6. This opens the metadata window (Figure 7) showing the document metadata in both languages and offering further links to full texts of the document in both languages (pdf) ...
... es, are formatted and presented to the user. The concordances are preceded by information identifying the document they originate from, and a link to summary metadata for this document in both languages . 1712 Figure 3. Bibliša+LeXimir UML component model 4. Supporting resources ...
... highlighted in both languages. Each concordance line is preceded by an identification of the document it originates from. Within this identification is also a link to full metadata of the document. A part of concordances for the initial query digitalna biblioteka is presented in Figure 6. ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... Automatic Document Processing for Text Analytics The main stages of thesaurus-based document processing include: • Tokenization and lemmatization, that is, the transfer of word forms to dictionary forms (lemmas); • Matching with the thesaurus based on the lemma representation of the document. Multiword ...
... migration rate, Decline in birth rate, Demographic prognosis, etc.; • Forming the conceptual index of the document. Conceptual index of a document consist of con- cepts found in the document and their assigned weights. The weight of a concept accounts for the 4(https://www.rbc.ru/economics/17/11/ ...
... automatically constructed thematic representation of documents that models the main topic and sub-topics of the document in sets (thematic nodes) of similar concepts mentioned in the document (Loukachevitch and Dobrov, 2015). Such a basis for the text categorization makes it possible to process texts ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
The Use of the Omeka Semantic Platform for the Development of the University of Belgrade, Faculty of Mining and Geology Digital Repository
Under the regulations of the Ministry of Education, Science and technological Development, a digital repository based on the Omeka S data storage platform has been developed for the Faculty of Mining and Geology. The platform has been upgraded with the required modular extensions, Solr index and automatic OCR. Furthermore, document indexing and search have been fine-tuned with the aid of e-dictionaries of the Serbian language, which has brought about outstanding results in terms of usage facilitation and overall ...Petar Popović, Mihailo Škorić, Biljana Rujević. "The Use of the Omeka Semantic Platform for the Development of the University of Belgrade, Faculty of Mining and Geology Digital Repository" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2020.20.1_2.9
-
Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)
Biljana Rujević, Mihailo Škorić (2024)The paper describes linking the Digital Repository of the University of Belgrade, Faculty of Mining and Geology, with the eScience system in terms of transferring metadata about the results of researchers' scientific work. The steps taken to ensure a smooth harvesting of metadata are outlined. Additionally, a presentation of additional improvements to the OAI system is provided, aiming to contribute to the automatic linking of authors with their results in the eScience system.Biljana Rujević, Mihailo Škorić. "Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)" in Infotheca, Faculty of Philology, University of Belgrade (2024). https://doi.org/10.18485/infotheca.2023.23.2.4
-
Managing mining project documentation using human language technology
Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentationAleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
-
Towards translation of educational resources using GIZA++
... consists of XML document (eXtensible Markup Language) preparation according to TEI (Text Encoding Initiative) consortium guidelines. In practice, this step is comprised of marking the divisions, titles, paragraphs and segments using text or XML editing software with support for DTD (Document Type Definition) ...
... however, must be corrected manually, which is done through the Concordancier software. [13] The next step is the production of a TMX document [14]. The document consists of, , (paragraph),
(Translation Unit) and (Translation unit variant) elements. [15] Metadata code ...
... order to establish a direct relation to metadata and the original (pdf, edX, docx,…) form of resource document, article, course or other resource. Image 2 presents one part from the TMX document with ID: 1.2010.1.4. From aligned TMX documents is easy to produce parallel text form for tools like ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
Novi koncept izrade Osnovne hidrogeološke karte Srbije
Igor Jemcov, Zoran Stevanović, Vladimir Živanović, Saša Milanović, Dušan Polomčić, Veselin Dragišić (2022)Osnovna hidrogeološka karta (OHGK) predstavlja bazični dokument u hidrogeologiji, a ima za cilj sagledavanje osnovnih tipova izdani što da omogućava sagledavanje podzemnih vodnih resursa na području obuhvaćenom kartom. Primena postojećeg Uputstva za izradu Osnovne hidrogeološke karte SFRJ 1:100.000 (iz 1984, odnosno 1988. godine), vezana je za brojne poteškoće, što je uslovilo da je u proteklom periodu od 30 godina bilo je više inicijativa za formiranjem novog Uputstva. Sagledavajući postojeću situaciju uz činjenice o savremenim trendovima razvoja hidrogeoloških karata u ...Igor Jemcov, Zoran Stevanović, Vladimir Živanović, Saša Milanović, Dušan Polomčić, Veselin Dragišić. "Novi koncept izrade Osnovne hidrogeološke karte Srbije" in Zbornik radova XVI srpskog Simpozijum o hidrogeologiji sa međunarodnim učešćem, Univerzitet u Beograd, Rudarsko-geološki fakultet (2022)
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... tured data. MarkLogic is a document database that has evolved from a native XML DBMS database to enterprise NoSQL. In one platform, it combines a database, search engine and application services. The preliminary alignment phase consists of preparing an XML document (eXtensible Markup Language) ...
... Environment) (Utvić et al., 2007). The TMX document consists of TU2 (Translation Unit) and TUV (Translation Unit Variant) elements, where each TUV is a segment in one of the languages. The following example illustrates a single aligned segment (TU) of a document:BAEKTEL 1.1 ...
... segments (sentences) have to be XML tagged. Any text editing software with support for well- formedness checking and validation according to a DTD (Document Type Definition) or XML Scheme can be used for that purpose. The next key step is the alignment itself: the task is to establish relations between ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
An Approach to Development of Bilingual Lexical Resources
... e-dictionaries, Serbian and English wordnets connected via the interlingual index, and a bilingual Dictionary of Librarianship, as well as on a TMX document collection generated from aligned Serbian-English journal articles published in INFOtheca, a scientific journal in the area of Library and Information ...
... Retrieval]: Digital Libraries – Collection General Terms Documentation, Languages Keywords Digital libraries, aligned parallel texts, TMX document collections, multilingual lexical resources, bilingual search 1. INTRODUCTION Multilingual information exchange is growing in importance and ...
... bilingual Serbian- English scientific journal, INFOtheca (http://infoteka.bg.ac.rs), covering the field of Library and Information Sciences. A TMX document collection was generated from INFOtheca articles using another of our tools, named ACIDE, an integrated development environment for generating ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
Synergistic Effect of the Insulation Characteristics of Gas Mixtures under the Influence of Pulse Voltages
Kartalović Nenad, Stanković Koviljka, Aleksandrović Snežana, Brajović Dragan. "Synergistic Effect of the Insulation Characteristics of Gas Mixtures under the Influence of Pulse Voltages" in IEEE Transactions on Dielectrics and Electrical Insulation 23 no. 6, :IEEE-Inst Eletrctrical Electronics Engineers Inc (2017): 3311-3318. https://doi.org/10.1109/TDEI.2016.005871
-
Combining Heterogeneous Lexical Resources
... language for addressing parts of an XML document. XPath treats an XML document as a tree of interrelated branches and nodes. A node in a XML document can be an element, attribute, processing instruction, comment, textual content, namespace, and document itself. The XPath tree model is based not ...
... to elements, and so on. For instance, the following XPath expression (10) //SYNSET[POS='n' and not(ILR/TYPE='hypernym')] retrieves from a XML document representing wordnet using the XSD from the Fig 1 all the nouns without hypernyms, i.e. first in a hierarchy. In this environment an Integrated ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
Fourth Summer Datathon on Linguistic Linked Open Data
Tijana Radović, Ranka Stanković (2023)The 4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22) was held in Spain, in Cersedilla near Madrid, in May 2022, and organized by the COST Action NexusLinguarum. The school gathered interested researchers, academics, students who wanted to acquire and/or expand their knowledge in the field of linguistic linked data science. During the school, a spectrum of topics from the field of linked data was presented, from various ontologies, through document integration, annotation and natural language text processing tools ...Tijana Radović, Ranka Stanković. "Fourth Summer Datathon on Linguistic Linked Open Data" in Infotheca, Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.6