Multimodal sentiment analysis using image and text fusion for emotion detection
Document Type
Article
Publication Title
Discover Computing
Abstract
Social media has become an essential platform for expressing personal experiences and emotions. Today’s youth frequently share images that reflect their emotional states, including happiness, excitement, sadness, anxiety, and distress. Accurately analyzing these images using new frameworks can offer beneficial insights into the emotional well-being of individuals. Beyond mental health applications, image sentiment analysis has significant potential in marketing and advertising. Brands and Marketers can get a more comprehensive understanding of consumer sentiments and preferences by examining the emotional reactions elicited by visual content. For instance, companies can analyze images shared by customers to gauge sentiment towards their products and services. Positive or negative feedback expressed through images can offer practical insights for improving products and customer experience. Additionally, Sentiment analysis is one tool that marketers can use to gauge the effectiveness of their advertising campaigns. By analyzing the sentiments of images associated with a campaign, they can determine which aspects resonate most with the audience and adjust their strategies accordingly. Our research focuses on creating an advanced multimodal sentiment analysis system that combines BERT and Vision Transformers (ViT) to analyze textual and image data. High-precision sentiment classification is achieved by our technique using a preprocessed AllenTAN dataset from Hugging Face. It conducts sentiment analysis using BERT, creates captions for unlabeled photos, and uses OCR to retrieve embedded image text. The suggested ViT + BERT technique performs well with a variety of social network content. The proposed system achieves an accuracy of 96.91%, demonstrating its robust performance across diverse social media content and benchmark models. This technology has several uses, particularly in social media monitoring to promote mental health content, as teens frequently use visuals to describe their feelings.
DOI
10.1007/s10791-025-09756-2
Publication Date
12-1-2025
Recommended Citation
Deshpande, Uttam U.; Shanbhag, Supriya; Sukhasare, Amit; and Dixit, Mahendra M., "Multimodal sentiment analysis using image and text fusion for emotion detection" (2025). Open Access archive. 11903.
https://impressions.manipal.edu/open-access-archive/11903