A Novel Approach to Deepfake Detection: Leveraging Fused Facial and Body Dynamics With a CNN–Transformer Hybrid Network

Document Type

Article

Publication Title

IEEE Access

Abstract

The rapid advancement of generative models like Generative Adversarial Networks (GANs) has contributed significantly to the creation of deep-fake videos. These synthetic videos pose serious threats to personal privacy, public trust, and societal stability, as they manipulate reality and influence perception of other. While Convolutional Neural Networks (CNNs) have shown progress in deepfake detection, many existing approaches struggle to effectively capture temporal inconsistencies across video frames. In this study, a novel hybrid deepfake detection model is proposed that uses both spatial and motion based features. The model utilizes VGG16 to extract high-level facial features and Google MoveNet to capture upper body pose information from data. Each video is divided into sequences of 20 frames, and the combined feature vectors of shape (20, 563) are passed through a deep learning architecture comprising a 1D Convolutional layer followed by a Transformer encoder. This setup enables the model to learn both intra frame and inter-frame dependencies. The model was trained and evaluated using a combined dataset of real and synthetic facial images, supplemented with an additional video dataset consisting of 408 authentic and 795 deepfake samples. Evaluation results demonstrate the effectiveness of the proposed approach, with the model achieving an accuracy of 84.48%. These findings show the potential of the proposed system for practical and reliable automated deepfake detection.

First Page

197085

Last Page

197108

DOI

10.1109/ACCESS.2025.3632155

Publication Date

1-1-2025

This document is currently not available here.

Share

COinS