Optimasi Naive Bayes Dengan Pemilihan Fitur Dan Pembobotan Gain Ratio

  • I Guna Adi Socrates Teknik Informatika, Institut Teknologi Sepuluh Nopember Surabaya
  • Afrizal Laksita Akbar Teknik Informatika, Institut Teknologi Sepuluh Nopember Surabaya
  • Mohammad Sonhaji Akbar Teknik Informatika, Institut Teknologi Sepuluh Nopember Surabaya
  • Agus Zainal Arifin Teknik Informatika, Institut Teknologi Sepuluh Nopember Surabaya
  • Darlis Herumurti Teknik Informatika, Institut Teknologi Sepuluh Nopember Surabaya

Abstract

Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.

Downloads

Download data is not yet available.

References

[1] U. S. F. dan W. Service, “Definitions of the Terms and Phrases of Amer-,” English, 2013. [Online]. Available: http://www.fws.gov/stand/defterms.html. [Accessed: 12-Dec-2015].
[2] L. Tenenboim, B. Shapira, and P. Shoval, “Ontology-based classification of news in an electronic newspaper,” Inf. Syst., 2008.
[3] D. D. Lewis, Naive(Bayes)at forty: The independence assumption in information retrieval. 1998.
[4] D. J. Hand and K. M. Yu, “Idiot’s Bayes - Not so stupid after all?,” Int. Stat. Rev., 2001.
[5] I. Konokenko, “Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition,” Current trends Knowledge Acquisition, pp. 190–197, 1990.
[6] P. Langley and S. Sage, “Induction of Selective Bayesian Classifiers,” Proceedings Tenth International Conference on Uncertainty in Artificial Inteligence, 1994.
[7] A. Hamzah, “Klasifikasi Teks Dengan Naïve Bayes Classifier (NBC) Untuk Pengelompokan Teks Berita Dan Abstract Akademis,” Prosiding Seminar Nasional Aplikasi Sains dan Teknologi Periode III, 2012.
[8] S. Garcia, “Search Engine Optimisation Using Past Queries,” School of Computer Science and Information Technology, 2007.
[9] P. Baldi, P. Frasconi, and P. Smyth, “Modeling the Internet and the Web: Probabilistic Methods and Algorithms,” Information Processing and Management, 2003.
[10] H. Zhang and S. Sheng, “Learning weighted naive bayes with accurate ranking,” in Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, 2004.
[11] J. Hilden and B. Bjerregaard, Computer-aided diagnosis and the atypical case. North Holland Publishing Co., 1976.
[12] J. T. A. S. Ferreira, D. G. T. Denison, and D. J. Hand, “Weighted naive Bayes modelling for data mining,” citeseerx, pp. 1–20, 2001.
[13] M. Hall, “A Decision Tree-Based Attribute Weighting Filter for Naive Bayes,” ACM, vol. 20, no. 2, pp. 120–126, 2007.
[14] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistic, vol. 22, no. 1, pp. 79–86, 1951.
[15] A. Renyi, “On Information and Sufficiency,” in Proceedings of the 4th Berkeley symposium on Mathematics, 1961, pp. 547–561.
[16] N. Hermaduanti and S. Kusumadewi, “Sistem Pendukung Keputusan Berbasis SMS Untuk Menentukan Status Gizi Dengan Metode K-Nearest Neighbor,” in Seminar Nasional Aplikasi Teknologi Informasi (SNATI), 2008, pp. 49–56.
Published
2016-03-30
How to Cite
SOCRATES, I Guna Adi et al. Optimasi Naive Bayes Dengan Pemilihan Fitur Dan Pembobotan Gain Ratio. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, [S.l.], p. 22-30, mar. 2016. ISSN 2541-5832. Available at: <https://ojs.unud.ac.id/index.php/lontar/article/view/19506>. Date accessed: 20 apr. 2024. doi: https://doi.org/10.24843/LKJITI.2016.v07.i01.p03.
Section
Articles

Keywords

Data Mining; Naïve Bayes; Weighted Naïve Bayes; Gain Ratio; Pemilihan Fitur