Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Објеката

Тип
Рад у зборнику
Верзија рада
објављена верзија
Језик
енглески
Креатор
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović
Извор
Trans. Computational Collective Intelligence - Lecture Notes in Computer Science
Уредник
Ngoc Thanh Nguyen, Ryszard Kowalczyk, Alexandre Miguel Pinto and Jorge S. Cardoso
Издавач
Springer
Датум издавања
2017
Сажетак
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named entity recognition. Documents in this geological database are described by a summary report, and other data, such as title, domain, keywords, abstract, and geographical location. These metadata were used for generating a bag of words for each document with the aid of morphological dictionaries and transducers. Named entities within metadata were also recognized with the help of a rule-based system. Both the bag of words and the metadata were then used for pre-indexing each document. A combination of several tf idf based measures was applied for selecting and ranking of retrieval results of indexed documents for a specific query and the results were compared with the initial retrieval system that was already in place. In general, a significant improvement has been achieved according to the standard information retrieval performance measures, where the InQuery method perfromed the best.
почетак странице
162
крај странице
185
doi
10.1007/978-3-319-59268-8_8
isbn
978-3-319-59267-1
Шира категорија рада
M30
Ужа категорија рада
M33
Права
Отворен приступ
Лиценца
Creative Commons – Attribution-NonComercial-No Derivative Works 4.0 International
Формат
.pdf
Волумен
26

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8