Transparent, Low Resource, and Context-Aware Information Retrieval From a Closed Domain Knowledge Base
Document Type
Article
Publication Title
IEEE Access
Abstract
In large-scale enterprises, vast amounts of textual information are shared across corporate repositories and intranet websites. Traditional search techniques that lack context sensitivity, often fail to retrieve pertinent data efficiently. Modern techniques that use a distributed representation of words require a considerable training dataset and computation, thereby presenting financial and operational burdens. Generative models for information search suffer from problems of transparency and hallucination, which can be detrimental, especially for organizations and their stakeholders who rely on these results for critical business operations. This paper presents a non-goal oriented conversational agent based on a collection of finite state machines and an information search model for text search from an extensive collection of stored corporate documents and intranet websites. We used a distributed representation of words derived from the BERT model, which allows for contextual searching. We minimally fine-tuned a BERT model on a multi-label text classification task specific to a closed-domain knowledge base. Based on DCG metrics, our information retrieval model using distributed embeddings from the minimally trained BERT model and Word Movers Distance for calculating topic similarity is more relevant to user queries than BERT embeddings with cosine similarity and BM25. Our architecture promises to significantly expedite and improve the accuracy of information retrieval in closed-domain systems without the need for a massive training dataset or expensive computing while maintaining transparency.
First Page
44233
Last Page
44243
DOI
10.1109/ACCESS.2024.3380006
Publication Date
1-1-2024
Recommended Citation
Rateria, Shubham and Singh, Sanjay, "Transparent, Low Resource, and Context-Aware Information Retrieval From a Closed Domain Knowledge Base" (2024). Open Access archive. 7134.
https://impressions.manipal.edu/open-access-archive/7134