Clustering Artikel pada Portal Berita Online Menggunakan Metode K-Means
Abstract
The news categories on news portals are so diverse that the performance of the editors is increasing. The number of news articles each month, adds to the editor's task to manually categorize articles into predetermined categories. Clustering can be used to group data so that later it can group data in the same category with similar data. K-Means is a method that can be used to perform clustering. K-Means is a distance-based clustering technique that is divided into a series of clusters and only works for numeric attributes. The K-Means test conducted in this study is intended to compare cluster values. The K-Means made in this study apply TF-IDF, feature selection, and PCA. The cluster value assessment process uses visualization in the form of a bar plot of each metric value that is considered, namely the mean silhouette, accuracy, precision, recall, F1-score, and silhouette score. The results of the research that has been carried out by the K-Means method can achieve 94.93% accuracy and recall, 95.07% precision, and 94.94% F1-score.
References
[2] M. Robani and A. Widodo, “Algoritma K-Means Clustering Untuk Pengelompokan Ayat Al Quran Pada Terjemahan Bahasa Indonesia,” J. Sist. Inf. Bisnis, vol. 6, no. 2, p. 164, 2016.
[3] G. E. I. Kambey et al., “Penerapan Clustering pada Aplikasi Pendeteksi Kemiripan Dokumen Teks Bahasa Indonesia,” J. Tek. Inform., vol. 15, no. 2, pp. 75–82, 2020.
[4] S. Gusriani, K. D. K. Wardhani, and M. I. Zul, “Analisis Sentimen Terhadap Toko Online di Sosial Media Menggunakan Metode Klasifikasi Naïve Bayes (Studi Kasus: Facebook Page BerryBenka) Top Words Analysis of Online Media in Indonesia View project Wifi Positioning System (WPS) View project,” Researchgate.Net, no. September, 2016.
[5] A. S. M. Romli, Jurnalistik Online: Panduan Mengelola Media Online. Bandung: Nuansa Cendekia, 2018.
[6] R. W. Sembiring Brahmana, F. A. Mohammed, and K. Chairuang, “Customer Segmentation Based on RFM Model Using K-Means, K-Medoids, and DBSCAN Methods,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 11, no. 1, p. 32, 2020.
[7] M. Z. Naf’an, A. Burhanuddin, and A. Riyani, “Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen,” J. Linguist. Komputasional, vol. 2, no. 1, pp. 23–27, 2019.
[8] U. Triharyuni, Setiya., Nugraha, Budi., Chodriyah, “Pengaruh Lama Setting Dan Jumlah Pancing Terhadap Hasil Tangkapan Rawai Tuna Di Laut Banda Influence of Setting Time and Numbers of Hooks At Tuna,” J. Lit. Perikan. Ind, vol. 19, pp. 81–88, 2013.
[9] A. S. Ritonga and I. Muhandhis, “Teknik Data Mining Untuk Mengklasifikasikan Data Ulasan Destinasi Wisata Menggunakan Reduksi Data Principal Component Analysis (PCA),” Edutic - Sci. J. Informatics Educ., vol. 7, no. 2, 2021.
[10] M. Hossin and M. N. Sulaiman, “A Review on Evaluation Metrics for Data Classification Evaluations,” Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 2, pp. 01–11, 2015.
[11] I. B. G. Sarasvananda, R. Wardoyo, and A. K. Sari, “The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 4, p. 313, 2019.