Enhancing Visual Question Answering for Multiple Choice Questions
Document Type
Article
Publication Title
IEEE Access
Abstract
The proposed paper examines enhancements in Visual Question Answering (VQA) by systematically tuning hyperparameters and utilizing advanced image and text encoders. The study particularly explores the adaptation of these models to Multiple-Choice Question (MCQ) formats, aiming to refine their accuracy and applicability. MCQs consist of a question stem along with a set of options, from which the correct answer, the key needs to be identified among the distractors. Using MCQs provides the model with some context of the correct answer, improving its performance over a simple multiclass classification task. The research showcases the effectiveness of precise hyperparameter adjustments in improving the performance of VQA systems, through comparative analysis of varied sets of hyperparameters, highlighting their improved reasoning capabilities across various datasets, including samples from real world images and academic questions. This demonstrates the potential of VQA models for robust application in both educational and practical scenarios.
First Page
93453
Last Page
93467
DOI
10.1109/ACCESS.2025.3572529
Publication Date
1-1-2025
Recommended Citation
Goel, Rashi; Nandwani, Harsh; Shah, Eshaan; and Nayak, Ashalatha, "Enhancing Visual Question Answering for Multiple Choice Questions" (2025). Open Access archive. 13880.
https://impressions.manipal.edu/open-access-archive/13880