From Vision to Voice: A Multi-Modal Assistive Framework for the Physically Impaired
Document Type
Article
Publication Title
IEEE Access
Abstract
Providing people with visual and physical limitations their ability to access textual content continues to be a difficult challenge. A desktop-assisted system with automated computer processing enables the conversion of text found in images into audible speech. The application uses Python to develop its interface with Tkinter libraries and implements Tesseract OCR for optical character recognition that receives images through real-time capture enabled by OpenCV. Through the googletrans library, the system enables multilingual operations (more than 100 languages) for text processing and translation across all languages accessible via Google Translate. The system converts extracted or translated text into speech output using Google Text-to-Speech (gTTS) that plays back audio through system default media players as. mp3 files. Users experience intuitive interaction with the interface because it features hover effect characteristics, accessible control elements, and language selection through a dropdown menu. Through an expandable structural design, the system delivers multilingual text-to-speech capabilities, which prove useful in assistive technology applications for accessibility needs.
First Page
128106
Last Page
128121
DOI
10.1109/ACCESS.2025.3590237
Publication Date
1-1-2025
Recommended Citation
Bhat, Suhas; Bhat, Prajwal; and Kolekar, Sucheta V., "From Vision to Voice: A Multi-Modal Assistive Framework for the Physically Impaired" (2025). Open Access archive. 14095.
https://impressions.manipal.edu/open-access-archive/14095