Exploratory Data Analysis and Water Potability Classification using Supervised Machine Learning Algorithms
Document Type
Article
Publication Title
Engineering Technology and Applied Science Research
Abstract
This study investigates the critical task of assessing water potability using supervised machine-learning techniques. The problem statement involves accurately predicting water potability based on chemical and physical parameters, which are crucial for public health and environmental sustainability. Exploratory Data Analysis (EDA) highlighted significant insights into feature distributions and correlations, guiding preprocessing steps and model selection. The Synthetic Minority Oversampling Technique (SMOTE) was applied to mitigate class imbalance, ensuring robust model training. Three classification algorithms, namely Logistic Regression (LR), K-Nearest Neighbors (KNN), and Random Forest (RF), were evaluated, with RF exhibiting superior performance after Optuna hyperparameter tuning, achieving an accuracy of 68%. Based on the performance of RF and KNN, a weighted voting-based ensemble technique achieved an accuracy of 71%. This study emphasizes the importance of leveraging machine learning to support water quality assessment, offering reliable tools for decision-making in public health and environmental management.
First Page
20898
Last Page
20903
DOI
10.48084/etasr.8904
Publication Date
4-1-2025
Recommended Citation
Priya, Kamath B.; Sharma, Geetanjali; Bongale, Anupkumar; and Dharrao, Deepak, "Exploratory Data Analysis and Water Potability Classification using Supervised Machine Learning Algorithms" (2025). Open Access archive. 13506.
https://impressions.manipal.edu/open-access-archive/13506