ESRVA: Enhanced Super-Resolution and Visual Annotation Model for Object-Level Image Interpretation Using Deep Learning

Document Type

Article

Publication Title

IEEE Access

Abstract

Computer vision allows machines to read and analyze visual information, driving applications ranging from medical imaging to autonomous systems. Such applications depend on the quality of input images; therefore, image enhancement is very important when working with low-resolution, noisy, or degraded inputs. The Enhanced Super-Resolution Annotation for Vision-based Systems (ESRVA) research presents an image enhancement and annotation framework based on the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN). The goal is to enhance the sharpness, resolution, and perceptual accuracy of low-quality images to aid downstream applications like object detection and segmentation. ESRVA uses a Residual-in-Residual Dense Block (RRDB) structure and perceptual loss to produce detail-retaining high-fidelity outputs. After enhancement, the framework conducts automatic image annotation based on an instance segmentation pipeline, which uses deep feature maps from the enhanced images to localize objects with high accuracy and classify them. Experimental evaluation shows appreciable improvement in different performance metrics. Evaluated proposed method on standard benchmarks using bicubic downsampling to simulate low-resolution inputs. While this provides a consistent basis for comparison, it does not fully replicate real-world degradation. Experimental evaluation after 150 epochs shows appreciable improvement in different performance metrics. The model produced a Peak Signal-to-Noise Ratio (PSNR) of 36.66 dB, Mean Squared Error (MSE) of 0.000216, and Structural Similarity Index Measure (SSIM) of 0.9621, reflecting high similarity to ground truth images. The system also registered a perceptual loss of 0.018849, Mean Intersection over Union (mIoU) of 0.9746, and Dice Coefficient of 0.9924, confirming the accuracy of instance segmentation on the improved outputs. Among the social implications, the most applicable use of ESRVA is in smart surveillance systems. In restricted light or bandwidth conditions, surveillance footage is typically of low resolution. With ESRVA built into such systems, they can reconstruct high-resolution scenes that are clean, improving people, car, and event identification. This helps to enhance public safety, ease law enforcement, and enhance the utility of smart city infrastructure and autonomous systems. By combining enhancement and annotation within a single pipeline, ESRVA provides a streamlined solution for applications needing high-quality visual inspection from low-quality input data.

First Page

171666

Last Page

171683

DOI

10.1109/ACCESS.2025.3616216

Publication Date

1-1-2025

This document is currently not available here.

Share

COinS