Enhancing Visual Question Answering for Multiple Choice Questions

Document Type

Article

Publication Title

IEEE Access

Abstract

The proposed paper examines enhancements in Visual Question Answering (VQA) by systematically tuning hyperparameters and utilizing advanced image and text encoders. The study particularly explores the adaptation of these models to Multiple-Choice Question (MCQ) formats, aiming to refine their accuracy and applicability. MCQs consist of a question stem along with a set of options, from which the correct answer, the key needs to be identified among the distractors. Using MCQs provides the model with some context of the correct answer, improving its performance over a simple multiclass classification task. The research showcases the effectiveness of precise hyperparameter adjustments in improving the performance of VQA systems, through comparative analysis of varied sets of hyperparameters, highlighting their improved reasoning capabilities across various datasets, including samples from real world images and academic questions. This demonstrates the potential of VQA models for robust application in both educational and practical scenarios.

First Page

93453

Last Page

93467

DOI

10.1109/ACCESS.2025.3572529

Publication Date

1-1-2025

This document is currently not available here.

Share

COinS