APAROS: A Clustering-Based Hybrid Approach for Handling Overlapped Regions in Imbalanced Datasets

Document Type

Article

Publication Title

IEEE Access

Abstract

In many real-world datasets, the class distribution is often highly imbalanced, and minority class samples are located within the majority regions, leading to significant overlap. These challenges produce a model of bias and misclassifications, particularly for minority classes. To address these issues, we introduced a novel technique that integrates Affinity Propagation with Adaptive Synthetic Sampling (ADASYN) and Random Oversampling (ROS), collectively termed as APAROS. Using Affinity Propagation Clustering (APC), our proposed method categorizes the data set into Overlapping Clusters (OLC), Pure minority Clusters (PmC) and Pure Majority Clusters (PMC). Further, we applied existing oversampling methods; ADASYN in OLC and ROS in PmC, effectively balancing the dataset while minimizing the risk of miss-classification in overlapped regions. The proposed method was evaluated on ten benchmark imbalanced datasets and compared against existing resampling techniques such as ADASYN, ROS and Kmeans-SMOTE. Seven machine learning classifiers were employed to evaluate the models performance considering the metrics such as accuracy, precision, recall, f1_score, Matthews Correlation Coefficient (MCC) and Cohen’s Kappa coefficient. Experimental results consistently demonstrate that APAROS performs significantly better than traditional resampling techniques and produced superior classification performance, particularly in highly overlapped and imbalanced scenarios.

First Page

170705

Last Page

170722

DOI

10.1109/ACCESS.2025.3614993

Publication Date

1-1-2025

This document is currently not available here.

Share

COinS