Optimasi Naive Bayes Dengan Pemilihan Fitur Dan Pembobotan Gain Ratio
Abstract
Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.
Downloads
References
[2] L. Tenenboim, B. Shapira, and P. Shoval, “Ontology-based classification of news in an electronic newspaper,” Inf. Syst., 2008.
[3] D. D. Lewis, Naive(Bayes)at forty: The independence assumption in information retrieval. 1998.
[4] D. J. Hand and K. M. Yu, “Idiot’s Bayes - Not so stupid after all?,” Int. Stat. Rev., 2001.
[5] I. Konokenko, “Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition,” Current trends Knowledge Acquisition, pp. 190–197, 1990.
[6] P. Langley and S. Sage, “Induction of Selective Bayesian Classifiers,” Proceedings Tenth International Conference on Uncertainty in Artificial Inteligence, 1994.
[7] A. Hamzah, “Klasifikasi Teks Dengan Naïve Bayes Classifier (NBC) Untuk Pengelompokan Teks Berita Dan Abstract Akademis,” Prosiding Seminar Nasional Aplikasi Sains dan Teknologi Periode III, 2012.
[8] S. Garcia, “Search Engine Optimisation Using Past Queries,” School of Computer Science and Information Technology, 2007.
[9] P. Baldi, P. Frasconi, and P. Smyth, “Modeling the Internet and the Web: Probabilistic Methods and Algorithms,” Information Processing and Management, 2003.
[10] H. Zhang and S. Sheng, “Learning weighted naive bayes with accurate ranking,” in Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, 2004.
[11] J. Hilden and B. Bjerregaard, Computer-aided diagnosis and the atypical case. North Holland Publishing Co., 1976.
[12] J. T. A. S. Ferreira, D. G. T. Denison, and D. J. Hand, “Weighted naive Bayes modelling for data mining,” citeseerx, pp. 1–20, 2001.
[13] M. Hall, “A Decision Tree-Based Attribute Weighting Filter for Naive Bayes,” ACM, vol. 20, no. 2, pp. 120–126, 2007.
[14] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistic, vol. 22, no. 1, pp. 79–86, 1951.
[15] A. Renyi, “On Information and Sufficiency,” in Proceedings of the 4th Berkeley symposium on Mathematics, 1961, pp. 547–561.
[16] N. Hermaduanti and S. Kusumadewi, “Sistem Pendukung Keputusan Berbasis SMS Untuk Menentukan Status Gizi Dengan Metode K-Nearest Neighbor,” in Seminar Nasional Aplikasi Teknologi Informasi (SNATI), 2008, pp. 49–56.
Keywords
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to Jurnal Lontar Komputer as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from Jurnal Lontar Komputer. The Editorial Board of Jurnal Lontar Komputer makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.