Penerapan Dizcretization dan Teknik Bagging Untuk Meningkatkan Akurasi Klasifikasi Berbasis Ensemble pada Algoritma C4.5 dalam Mendiagnosa Diabetes
Abstract
In the field of health, data mining can be used to predict a disease from patient medical record data, diabetes. There are several data mining models which one is classification. In the access field, there are many branches that are developing the decision tree (decision tree). One popular decision tree is C4.5. In this study, the data used were pima indian diabetes dataset taken from UCI machine learning repository. In this dataset all attributes are of continuous numeric type and for combined continuous data discretization is used. Accuracy is very important in the classification, ensemble method is a method used to improve the accuracy of classification algorithm by building some classifier of training data. From the research results, by applying discretization and bagging techniques to ensemble-based classification on C4.5 algorithm can increase the accuracy of 6.26%. With an initial accuracy of 68.61%, after applied discretization and bagging techniques to 74.87%..
Downloads
References
[2] Tsai, C. J., Lee, C. I. & Yang, W. P., “A Discretization Algorithm Based on Class-Attribute Contingency Coefficient”, Information Sciences, vol. 178, no. 3, pp.714-731, 2008.
[3] Muzakir, A., & Wulandari, R. A, “Model Data Mining sebagai Prediksi Penyakit Hipertensi Kehamilan dengan Teknik Decision Tree”, Scientific Journal of Informatics, vol. 3 no. 1, pp.19-26, 2016.
[4] Nuwangi, S., Oruthotaarachchi, C. R., Tilakaratna, J. & Caldera, H. A, “Utilization Of Data Mining Techniques In Knowledge Extraction For Diminution Of Diabetes”. Proceeding of 2010 2th Vaagdevi International Conference on Information Technology For Real World Problems (VCON) pp.3-8. 2010.
[5] Al-Ibrahim, A. 2011. Discretization of Continuous Attributes in Supervised Learning algorithms. The Research Bulletin of Jordan ACM-ISWSA. pp.158-166.
[6] Kerber, R. 1992. Chimerge: Discretization of numeric attributes. In Proceedings of the tenth national conference on Artificial intelligence. Aaai Press. 123-128.
[7] Dash, R., Paramguru, R. L., & Dash, R, “Comparative analysis of supervised and unsupervised discretization techniques”, International Journal of Advances in Science and Technology, vol. 2, no. 3, pp.29-37, 2011.
[8] Nurcahyani, Arissa Aprilia, & Ristu Saptono, “Identifikasi Kualitas Beras dengan Citra Digital”. Scientific Journal of Informatics, vol. 2, no.1, pp.63-72, 2016.
[9] Tan, Pang, N., Michael, S. & Vipin, K. 2006. Introduction to Datamining. Boston: Pearson Addison Wesley.
[10] Somantri, O., Sasmito, G. W., & Sungkar, M. S, “Optimalisasi Neural Network dengan Bootstrap Aggregating (Bagging) untuk Penentuan Prediksi Harga Listrik”, Scientific Journal of Informatics, vol. 1, no.2, pp.185-192, 2015.
[11] Gorunescu, F., Data Mining: Concepts and Techniques, Verlag Berlin Heidelberg: Springer, 2011.
[12] Prasetyo, E, Data Mining: Mengolah Data Menjadi Informasi Menggunakan Matlab, Yogyakarta: CV. Andi Offset, 2014.
[13] Han, J., Micheline, K., & Jian, P, Data mining: Concepts and Techniques (3th ed.), Waltham, MA: Elsevier/Morgan Kaufmann, 2012.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to Jurnal Lontar Komputer as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from Jurnal Lontar Komputer. The Editorial Board of Jurnal Lontar Komputer makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.