Optimizing Random Forest using Genetic Algorithm for Heart Disease Classification
Abstract
Heart disease is a leading cause of death worldwide, and the need for effective predictive systems is a major source of the need to treat affected patients. This study aimed to determine how to improve the accuracy of Random Forest in predicting and classifying heart disease. The experiments performed in this study were designed to select the most optimal parameters using an RF optimization technique using GA. The Genetic Algorithm (GA) is used to optimize RF parameters to predict and classify heart disease. Optimization of the Random Forest parameter using a genetic algorithm is carried out by using the Random Forest parameter as input for the initial population in the Genetic Algorithm. The Random Forest parameter undergoes a series of processes from the Genetic Algorithm: Selection, Crossover Rate, and Mutation Rate. The chromosome that has survived the evolution of the Genetic Algorithm is the best population or best parameter Random Forest. The best parameters are stored in the hall of fame module in the DEAP library and used for the classification process in Random Forest. The optimized RF parameters are max_depth, max_features, n_estimator, min_sample_leaf, and min_sample_leaf. The experimental process performed in RF uses the default parameters, random search, and grid search. Overall, the accuracy obtained for each experiment is the default parameter 82.5%, random search 82%, and grid search 83%. The RF+GA performance is 85.83%; this result is affected by the GA parameters are generations, population, crossover, and mutation. This shows that the Genetic Algorithm can be used to optimize the parameters of Random Forest.
Downloads
References
[2] K. H. Miao, J. H. Miao, and G. J. Miao, "Diagnosing Coronary Heart Disease using Ensemble Machine Learning," International Journal of Advanced Computer Science and Applications(IJACSA), vol. 7, no. 10, pp. 30–39, 2016, doi: 10.14569/ijacsa.2016.071004.
[3] I. Tougui, A. Jilbab, and J. El Mhamdi, "Heart disease classification using data mining tools and machine learning techniques," Health and Technology, vol. 10, no. 5, pp. 1137–1144, 2020, doi: 10.1007/s12553-020-00438-1.
[4] N. B. Muppalaneni, M. Ma, and S. Gurumoorthy, Soft Computing and Medical Bioinformatics. Springer Singapore, 2019. doi: 10.1007/978-981-13-0059-2.
[5] H. Kaur and D. Gupta, "Human Heart Disease Prediction System Using Random Forest Technique," International Journal of Computer Science and Engineering, vol. 6, no. 7, pp. 634–640, 2018.
[6] P. V. S. N. Sravanthi and P. Rajesh, "An exploration of prediction of heart disease using machine learning classification," International Journal Scientific & Technology Research, vol. 9, no. 3, pp. 6817–6824, 2020.
[7] R. R. Waliyansyah and N. D. Saputro, “Forecasting New Student Candidates Using the Random Forest Method,” Lontar Komputer Jurnal Ilmiah Teknologi Informasi, vol. 11, no. 1, p. 44, 2020, doi: 10.24843/lkjiti.2020.v11.i01.p05.
[8] I. Syarif, A. Prugel-Bennett, and G. Wills, "SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance," TELKOMNIKA (Telecommunication Computing Electronics and Control, vol. 14, no. 4, p. 1502, 2016, doi: 10.12928/telkomnika.v14i4.3956.
[9] A. S. Wicaksono and A. A. Supianto, "Hyperparameter optimization using genetic algorithm on machine learning methods for online news popularity prediction," International Journal of Advanced Computing Science and Application, vol. 9, no. 12, pp. 263–267, 2018, doi: 10.14569/IJACSA.2018.091238.
[10] P. Probst, M. N. Wright, and A. L. Boulesteix, "Hyperparameters and tuning strategies for random forest," Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, vol. 9, no. 3, 2019, doi: 10.1002/widm.1301.
[11] R. Schaer, H. Müller, and A. Depeursinge, "Optimized distributed hyperparameter search and simulation for lung texture classification in CT using Hadoop," Journal of Imaging, vol. 2, no. 2, 2016, doi: 10.3390/jimaging2020019.
[12] D. Ming, T. Zhou, M. Wang, and T. Tan, "Land cover classification using random forest with genetic algorithm-based parameter optimization," Journal of Applied Remote Sensing, vol. 10, no. 3, p. 035021, 2016, doi: 10.1117/1.jrs.10.035021.
[13] G. Rivera, L. Cisneros, P. Sánchez-Solís, N. Rangel-Valdez, and J. Rodas-Osollo, "Genetic algorithm for scheduling optimization considering heterogeneous containers: A real-world case study," Axioms, vol. 9, no. 1, 2020, doi: 10.3390/axioms9010027.
[14] N. K. Kumar, D. Vigneswari, M. V. Krishna, and G. V. P. Reddy, "An Optimized Random Forest Classifier for Diabetes Mellitus", Emerging Technologies in Data Mining and Information Security, doi: 10.1007/978-981-13-1498-8.
[15] S. S. Shah and M. A. Pradhan, "R-Ga: an Efficient Method for Predictive Modeling of Medical Data Using a Combined Approach of Random Forests and Genetic Algorithm," ICTACT Journal on Soft Computing, vol. 06, no. 02, pp. 1153–1156, 2016, doi: 10.21917/ijsc.2016.0160.
[16] M. D. Yudianto, T. M. Fahrudin, and A. Nugroho, "A Feature-Driven Decision Support System for Heart Disease Prediction Based on Fisher's Discriminant Ratio and Backpropagation Algorithm," Lontar Komputer Journal Ilmiah Teknologi Informasi, vol. 11, no. 2, p. 65, 2020, doi: 10.24843/lkjiti.2020.v11.i02.p01.
[17] "Heart Disease Data Set." https://archive.ics.uci.edu/ml/datasets/heart+disease (accessed Apr. 01, 2021).
[18] A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit,” Jurnal Informatika, vol. 5, no. 2, pp. 175–185, 2018, doi: 10.31311/ji.v5i2.4158.
[19] E. Goel and E. Abhilasha, "Random Forest: A Review," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 7, no. 1, pp. 251–257, 2017, doi: 10.23956/ijarcsse/v7i1/01113.
[20] S. Kumar and G. Sahoo, "A random forest classifier based on genetic algorithm for cardiovascular diseases diagnosis," International Journal of Engineering Transaction B: Application, vol. 30, no. 11, pp. 1723–1729, 2017, doi: 10.5829/ije.2017.30.11b.13.
[21] S. M. Elsayed, R. A. Sarker, and D. L. Essam, "A new genetic algorithm for solving optimization problems," Engineering Application of Artificial Intelligence, vol. 27, pp. 57–69, 2014, doi: 10.1016/j.engappai.2013.09.013.
[22] K. Kim, K. Lee, and H. Ahn, "Predicting corporate financial sustainability using Novel Business Analytics," Sustainability, vol. 11, no. 1, pp. 1–17, 2018, doi: 10.3390/su11010064.
[23] J. Emakhu, S. Shrestha, and S. Arslanturk, "Prediction system for heart disease based on ensemble classifiers," Proceedings of the 5th International Conference on Industrial Engineering and Operations Management, no. August, pp. 2337–2347, 2020.
[24] C. G. Siji George and B. Sumathi, "Grid search tuning of hyperparameters in random forest classifier for customer feedback sentiment prediction," International Journal of Advanced Computer Science and Applications(IJACSA), vol. 11, no. 9, pp. 173–178, 2020, doi: 10.14569/IJACSA.2020.0110920.
[25] P. Liashchynskyi and P. Liashchynskyi, "Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS," no. 2017, pp. 1–11, 2019.
[26] J. Kim and S. Yoo, "Software review: DEAP (Distributed Evolutionary Algorithm in Python) library," Genetic Programming and Evolvable Machines, vol. 20, no. 1, pp. 139–142, 2019, doi: 10.1007/s10710-018-9341-4.
[27] D. Krishnani, A. Kumari, A. Dewangan, A. Singh, and N. S. Naik, "Prediction of Coronary Heart Disease using Supervised Machine Learning Algorithms," IEEE Region 10 Annual International Conference Proceedings/TENCON, vol. 2019-Octob, pp. 367–372, 2019, doi: 10.1109/TENCON.2019.8929434.
[28] E. K. Hashi and Md. Shahid Uz Zaman, "Developing a Hyperparameter Tuning Based Machine Learning Approach of Heart Disease Prediction," Journal of Applied Science & Process Engineering, vol. 7, no. 2, pp. 631–647, 2020, doi: 10.33736/jaspe.2639.2020.
[29] P. T. Nguyen, N. B. Vu, L. Van Nguyen, L. P. Le, and K. D. Vo, "The Application of Fuzzy Analytic Hierarchy Process (F-AHP) in Engineering Project Management," 2018 IEEE 5th International Conference Engineering Technologies Applied Science (ICETAS) 2018, pp. 1–4, 2019, doi: 10.1109/ICETAS.2018.8629217.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to Jurnal Lontar Komputer as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from Jurnal Lontar Komputer. The Editorial Board of Jurnal Lontar Komputer makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.