Klasifikasi Jenis Obat Berdasarkan Gejala yang Dimiliki Pasien Menggunakan Metode K-Nearest Neighbors (KNN)
Abstract
This research applies the K-Nearest Neighbors (KNN) algorithm to classify medicine types based on patient symptoms using a dataset from Kaggle with 200 rows and 6 columns. After preprocessing steps such as handling missing values, encoding categorical variables, and splitting data into training and testing sets, exploratory data analysis (EDA) was performed to understand the dataset's structure. The KNN model was evaluated with k values of 1, 2, and 3, finding the optimal k to be 3, achieving an accuracy of 77.50% with average precision of 0.76, recall of 0.69, and f1-score of 0.66. Lower accuracy was observed for k=2 (65.00%) and k=1 (67.50%), indicating that k=3 is the most effective for this dataset. These results suggest that while KNN is a viable method for classifying medicine types based on symptoms, larger datasets are recommended for improved accuracy.
Keywords: K-Nearest Neighbors (KNN), classify, medicine, exploratory data analysis (EDA), preprocessing