Klasifikasi Kualitas Air Layak Minum menggunakan Algoritma Random Forest Classifier dan GridsearchCV
Abstract
Drinkable water is water that is healthy for humans to drink and does not pose significant health risks. To determine whether water has a quality that meets health standards can be determined through the substances or minerals contained in it. Conventional methods require quite a long time to evaluate and classify water quality as suitable for consumption or not. One approach that can be used to overcome this problem is to utilize machine learning. This research uses a random forest to carry out classification. Using random forest by default cannot produce optimal performance because the parameters used are not necessarily the best. Therefore, this research also uses GridsearchCV to find optimal hyperparameter values in the Random Forest Classifier. After hyperparameter tuning, an optimal model was obtained with each parameter n_estimators 100, max_depth 9, max_features 4, and min_samples_split 2. The performance of Random Forest after hyperparameter tuning increased accuracy, which was initially 76% increase to 84%, precision which was initially 76.19% increase to 81.70%, recall which was initially 74.89% increase to 85.53%, and f1-score which was initially 75.53%, increase to 83.57%.
Keywords: Classification, Drinking Water Quality, Random Forest, Optimal Hyperparameters, Hyperparameter Tuning, GridSearchCV