Diagnosis of Sarcoidosis Through Supervised Ensemble Method and GenAI-Based Data Augmentation: An Intelligent Diagnostic Tool

Document Type

Article

Publication Title

Applied Sciences Switzerland

Abstract

Sarcoidosis, one of the rarest diseases, is challenging to diagnose as it mimics the symptoms of other diseases. Machine learning algorithms identify hidden patterns among symptoms, making them suitable for early diagnosis of Sarcoidosis. In this study, four ensemble models are developed using baseline classifiers and applied to a symptom-based secondary dataset to explore the hidden information. The dataset comprises 189 patient records with 14 attributes: 2 serum markers, 10 symptoms, the patient’s gender, and 1 target variable. An exploratory data analysis is carried out using necessary preprocessing techniques, including missing value imputation and data scaling. The features are selected using PCA, and the relevance of the features is analyzed using the Chi-Square Test, Mutual Information, Sequential Feature Selection, and Tree-Based Selection methods. CTGAN, a GenAI technique, is used to augment the dataset, as it contains only 189 records. CTGAN preserves the clinical fidelity of all the features pertaining to the diagnosis of Sarcoidosis, ensuring synthetic data retains meaningful diagnostic patterns. The performance of the models developed is evaluated by applying them to both the original and synthetic data. Results demonstrate that proposed ensemble methods, Model Combinations 1, 3, and 4, showed 99.47% accuracy on the original dataset, whereas Model Combination 1 and Random Forest classifier showed 85.19% and 60.78% accuracies on a combination of the original with 81 synthetic and 1000 synthetic data, respectively. This highlights the combined advantage of CTGAN-based augmentation and ensemble learning in enhancing diagnostic modeling for rare diseases like Sarcoidosis where the datasets are available with limited data points.

DOI

10.3390/app152212213

Publication Date

11-1-2025

This document is currently not available here.

Share

COinS