From Vision to Voice: A Multi-Modal Assistive Framework for the Physically Impaired

Document Type

Article

Publication Title

IEEE Access

Abstract

Providing people with visual and physical limitations their ability to access textual content continues to be a difficult challenge. A desktop-assisted system with automated computer processing enables the conversion of text found in images into audible speech. The application uses Python to develop its interface with Tkinter libraries and implements Tesseract OCR for optical character recognition that receives images through real-time capture enabled by OpenCV. Through the googletrans library, the system enables multilingual operations (more than 100 languages) for text processing and translation across all languages accessible via Google Translate. The system converts extracted or translated text into speech output using Google Text-to-Speech (gTTS) that plays back audio through system default media players as. mp3 files. Users experience intuitive interaction with the interface because it features hover effect characteristics, accessible control elements, and language selection through a dropdown menu. Through an expandable structural design, the system delivers multilingual text-to-speech capabilities, which prove useful in assistive technology applications for accessibility needs.

First Page

128106

Last Page

128121

DOI

10.1109/ACCESS.2025.3590237

Publication Date

1-1-2025

This document is currently not available here.

Share

COinS