Klasifikasi Berita Berdasarkan Kategori Menggunakan Multinomial Naïve Bayes dengan K-Cross Validation dan Seleksi Fitur Chi-Squared
Bahasa Indonesia
Abstract
Classifying news articles based on categories is an important challenge in text analysis and natural language processing. Most categorization of online news articles is often done manually, making it a complex and time-consuming process. To address this issue, the development of an automatic system capable of classifying news articles into various categories such as technology, sports, and entertainment is needed. The system is built using an approach to classify news articles into several appropriate categories using the Naïve Bayes method with TF-IDF weighting and feature selection using Chi-Squared. The Naïve Bayes model training uses the reduced feature results of 10,000 features from 54,091 features. Evaluation results show that the Naïve Bayes approach is able to produce a news classification model with good accuracy, with accuracy, precision, recall, and f1-score values of 96%.
References
[2] W. F. Mahmudy dan A. W. Widodo, “KLASIFIKASI ARTIKEL BERITA SECARA OTOMATIS MENGGUNAKAN METODE NAIVE BAYES CLASSIFIER YANG DIMODIFIKASI,” TEKNO, vol. 21, Mar 2014.
[3] A. F. Hidayatullah dkk., “Penerapan Text Mining dalam Klasifikasi Judul Skripsi,” 2016.
[4] S. Kumar, A. Sharma, B. K. Reddy, S. Sachan, V. Jain, dan J. Singh, “An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation. ,” International Journal of System Assurance Engineering and Management, vol. 13, hlm. 1–15, Nov 2021.
[5] A. Sabrani, I. W. Gede Putu Wirarama Wedashwara, dan F. Bimantoro, “METODE MULTINOMIAL NAÏVE BAYES UNTUK KLASIFIKASI ARTIKEL ONLINE TENTANG GEMPA DI INDONESIA (Multinomial Naïve Bayes Method for Classification of Online Article About Earthquake in Indonesia).” [Daring]. Tersedia pada: http://jtika.if.unram.ac.id/index.php/JTIKA/
[6] S. K. Dirjen dkk., “Terakreditasi SINTA Peringkat 2 Klasifikasi Berita Menggunakan Algoritma Naive Bayes Classifier Dengan Seleksi Fitur Dan Boosting,” masa berlaku mulai, vol. 1, no. 3, hlm. 227–232, 2017.
[7] irwan Budiman, R. F. M, dan D. T. Nugrahadi, “STUDI EKSTRAKSI FITUR BERBASIS VEKTOR WORD2VEC PADA PEMBENTUKAN FITUR BERDIMENSI RENDAH,” Jurnal Komputasi, vol. 8, 2020.
[8] E. Indrayuni, “Klasifikasi Text Mining Review Produk Kosmetik Untuk Teks Bahasa Indonesia Menggunakan Algoritma Naive Bayes,” Jurnal Khatulistiwa Informatika, vol. 7, no. 1, hlm. 29–36, 2019.
[9] P. M. Prihatini, “IMPLEMENTASI EKSTRAKSI FITUR PADA PENGOLAHAN DOKUMEN BERBAHASA INDONESIA,” Jurnal Manajemen Teknologi dan Informatika , vol. 6, no. 3, 2016.
[10] N. Komang dkk., “Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia,” MERPATI, vol. 6, no. 2, 2018.
[11] S. Goswami, “Using the Chi-Squared test for feature selection with implementation,” Nov 2020.
[12] B. Harjito, K. N. Aini, dan B. Murtiyasa, “Klasifikasi Dokumen berkonten Serangan jaringan menggunakan Multinomial Naive Bayes,” Seminar Nasional Teknologi Informasi dan Komunikasi (SEMNASTIK), vol. 1, no. 1, hlm. 112–118, 2018.
[13] E. Mas’udah, E. Wahyuni, dan A. Anjani, “Analisis sentimen: Pemindahan ibu kota Indonesia pada twitter,” Jurnal Informatika dan Sistem Informasi, vol. 1, no. 2, hlm. 397–401, 2020.