Efforts of Performance Optimization: The Experiment on Ten Accounting Datasets

Zico Karya Saputra Domas; M. Rizkiawan; Roby Rakhmadi

doi:10.24843/LKJITI.2022.v13.i03.p04

Zico Karya Saputra Domas Directorate General of Taxes of Indonesia Jakarta, Indonesia
M. Rizkiawan Directorate General of Taxes
Roby Rakhmadi International Relations Department of Lampung University

DOI: https://doi.org/10.24843/LKJITI.2022.v13.i03.p04

Abstract

In the big data and digitalization era, fast-accurate decision-making has become a basic need, so data mining has a crucial role. The decision tree algorithm is quite commonly applied for classification functions, but performance level must always be evaluated for optimizing accuracy rate. Several optimization methods to accommodate these objectives include GA-bagging, PSO-bagging, forward selection, backward elimination, SMOTE, under-sampling, GA-Adaboost, and ABSMOTE-WIGFS. The results of the decision tree experiment on ten types of accounting-finance datasets used in this study obtained results with an average accuracy of 83.46%, an average precision of 65.64%, and an average AUC of 71.9%, while the majority of various optimizations are proven in improving the performance of decision tree algorithm where the application of ABSMOTE-WIGFS method is proven in providing the best rate with an average accuracy 87.71%, an average precision 87.09%, and an average AUC 84.87%, so it can be concluded that various optimization efforts are worth to be applied in case of accounting-finance themes for increasing the performance rate. Furthermore, the next research can prove these methods in other fields outside of accounting cases.

Downloads

Download data is not yet available.

References

[1] J. Liu et al., "Artificial intelligence in the 21st century," IEEE Access, vol. 6, pp. 34403–34421, 2018, doi: 10.1109/ACCESS.2018.2819688.
[2] S. Tangwannawit and P. Tangwannawit, "An optimization clustering and classification based on artificial intelligence approach for internet of things in agriculture," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 1, p. 201, March 2022, doi: 10.11591/ijai.v11.i1.pp201-209.
[3] A. A. J. V. Priyangka and I. M. S. Kumara, "Classification Of Rice Plant Diseases Using the Convolutional Neural Network Method," Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 2, p. 123, August 2021, doi: 10.24843/LKJITI.2021.v12.i02.p06.
[4] M. Panda, D. P. Mishra, S. M. Patro, and S. R. Salkuti, "Prediction of diabetes disease using machine learning algorithms," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 1, p. 284, March 2022, doi: 10.11591/ijai.v11.i1.pp284-290.
[5] Z. E. Fitri, L. N. Sahenda, P. S. D. Puspitasari, P. Destarianto, D. L. Rukmi, and A. M. N. Imron, “The The Classification of Acute Respiratory Infection (ARI) Bacteria Based on K-Nearest Neighbor,” Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 2, p. 91, 2021, doi: 10.24843/lkjiti.2021.v12.i02.p03.
[6] I. M. A. S. Widiatmika, I. N. Piarsa, and A. F. Syafiandini, “Recognition of The Baby Footprint Characteristics Using Wavelet Method and K-Nearest Neighbor (K-NN),” Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 1, p. 41, 2021, doi: 10.24843/lkjiti.2021.v12.i01.p05.
[7] P. A. W. Santiary, I. K. Swardika, I. B. I. Purnama, I. W. R. Ardana, I. N. K. Wardana, and D. A. I. C. Dewi, "Labeling of an intra-class variation object in deep learning classification," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 1, p. 179, March 2022, doi: 10.11591/ijai.v11.i1.pp179-188.
[8] M. Sánchez, V. Olmedo, C. Narvaez, M. Hernández, and L. Urquiza-Aguiar, "Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques," International Journal on Advanced Science, Engineering and Information Technology, vol. 11, no. 6, p. 2534, December 2021, doi: 10.18517/ijaseit.11.6.14345.
[9] D. A. Cieslak, T. R. Hoens, N. V. Chawla, and W. P. Kegelmeyer, "Hellinger distance decision trees are robust and skew-insensitive," Data Mining and Knowledge Discovery, vol. 24, no. 1, pp. 136–158, January 2012, doi: 10.1007/s10618-011-0222-1.
[10] Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, "Cost-sensitive boosting for classification of imbalanced data," Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, December 2007, doi: 10.1016/j.patcog.2007.04.009.
[11] A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets, 10th ed. Berlin: Springer, 2018.
[12] J. Van Hulse and T. Khoshgoftaar, "Knowledge discovery from imbalanced and noisy data," Data & Knowledge Engineering., vol. 68, no. 12, pp. 1513–1542, December 2009, doi: 10.1016/j.datak.2009.08.005.
[13] A. Ilham, “Komparasi Algoritma Kasifikasi dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang,” Jurnal Ilmiah Ilmu Komputer, vol. 3, no. 1, 1 April 2017, pp. 1-6, doi: 10.35329/jiik.v3i1.60.
[14] S. Mulyati, Y. Yulianti, and A. Saifudin, “Penerapan Resampling dan Adaboost untuk Penanganan Masalah Ketidakseimbangan Kelas Berbasis Naϊve Bayes pada Prediksi Churn Pelanggan,” Jurnal Informatika Universitas Pamulang, vol. 2, no. 4, p. 190, Desember 2017, doi: 10.32493/informatika.v2i4.1440.
[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal Of Artificial Intelligence Research, vol. 16, no. 2, pp. 321–357, June 2002, doi: 10.1613/jair.953.
[16] R. S. Wahono, N. S. Herman, and S. Ahmad, "Neural network parameter optimization based on genetic algorithm for software defect prediction," Advanced Science Letters, vol. 20, no. 10–12, pp. 1951–1955, 2014, doi: 10.1166/asl.2014.5641.
[17] A. Saifudin and R. S. Wahono, “Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” IlmuKomputer.com Journal of Software Engineering, vol. 1, no. 2, pp. 76–85, 2015.
[18] J. Sun, J. Lang, H. Fujita, and H. Li, "Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates," Information Sciences, vol. 425, pp. 76–91, Jan. 2018, doi: 10.1016/j.ins.2017.10.017.
[19] J. Shin, S. Yoon, Y. W. Kim, T. Kim, B. G. Go, and Y. K. Cha, "Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms," Ecological Informatics, vol. 61, p. 101202, 2021, doi: 10.1016/j.ecoinf.2020.101202.
[20] Y. E. Kurniawati and Y. D. Prabowo, "Model optimization of class imbalanced learning using ensemble classifier on over-sampling data," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 1, p. 276, March 2022, doi: 10.11591/ijai.v11.i1.pp276-283.
[21] M. F. Nugroho and S. Wibowo, “Fitur Seleksi Forward Selection Untuk Menetukan Atribut Yang Berpengaruh Pada Klasifikasi Kelulusan Mahasiswa Fakultas Ilmu Komputer UNAKI Semarang Menggunakan Algoritma Naive Bayes,” Jurnal Informatika Upgris, vol. 3, no. 1, pp. 63–70, September 2017, doi: 10.26877/jiu.v3i1.1669.
[22] J. Zeniarja, A. Ukhifahdhina, and A. Salam, "Diagnosis Of Heart Disease Using K-Nearest Neighbor Method Based On Forward Selection," Journal of Applied Intelligent System (JAIS), vol. 4, no. 2, pp. 39–47, March 2020, doi: 10.33633/jais.v4i2.2749.
[23] V. Chandani and R. S. Wahono, “Komparasi Algoritma Klasifikasi Machine Learning Dan Feature Selection pada Analisis Sentimen Review Film,” Journal of Intelligent Systems, vol. 1, no. 1, pp. 55–59, 2015.
[24] E. Pradana, “Analisis Penerapan Adaptive Boosting ( Adaboost ) Dalam Meningkatkan Performasi Algoritma C4.5,” Skripsi, Program Studi Teknik Informatika Universitas Pelita Bangsa, 2018.
[25] D. Thammasiri, D. Delen, P. Meesad, and N. Kasap, "A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition," Expert Systems with Applications, vol. 41, no. 2, pp. 321–330, February 2014, doi: 10.1016/j.eswa.2013.07.046.
[26] N. S. Ramadhanti, W. A. Kusuma, and A. Annisa, “Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 7, no. 6, p. 1221, Desember 2020, doi: 10.25126/jtiik.2020762857.
[27] Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, and M. Kanehisa, "Prediction of drug-target interaction networks from the integration of chemical and genomic spaces," Bioinformatics, vol. 24, no. 13, pp. i232–i240, July 2008, doi: 10.1093/bioinformatics/btn162.
[28] F. D. Astuti and F. N. Lenti, “Implementasi SMOTE untuk mengatasi Imbalance Class pada Klasifikasi Car Evolution menggunakan K-NN,” JUPITER (Jurnal Penelitian Ilmu dan Teknologi Komputer), vol. 13, no. 1, pp. 89–98, 2021.
[29] R. S. Wahono, N. Suryana, and S. Ahmad, "Metaheuristic Optimization based Feature Selection for Software Defect Prediction," Journal of Software, vol. 9, no. 5, pp. 1324–1333, May 2014, doi: 10.4304/jsw.9.5.1324-1333.
[30] R. S. Wahono and N. S. Herman, "Genetic Feature Selection for Software Defect Prediction," Advanced Science Letters, vol. 20, no. 1, pp. 239–244, Jan. 2014, doi: 10.1166/asl.2014.5283.
[31] I. Ispandi and R. S. Wahono, “Penerapan Algoritma Genetika untuk Optimasi Parameter pada Support Vector Machine untuk Meningkatkan Prediksi Pemasaran Langsung,” Journal of Intelligent Systems, vol. 1, no. 2, pp. 115–119, 2015, [Online]. Available: http://journal.ilmukomputer.org/index.php/jis/article/view/53
[32] F. Handayanna, “Prediksi Penyakit Diabetes Mellitus Dengan Metode Support Vector Machine Berbasis Particle Swarm Optimization,” Jurnal Teknik Informatika (JTI), vol. 2, no. 1, pp. 30–37, 2016, [Online]. Available: https://ejournal.antarbangsa.ac.id/jti/article/view/5
[33] A. A. Saraswati, “Optimasi Algoritma C4.5 dalam Prediksi Sekolah Lanjutan Tingkat Atas Menggunakan Seleksi Fitur Algoritma Genetika di SMP Islam Al-hikmah Pondok Cabe,” Skripsi, Program Studi Teknik Informatika Universitas Pelita Bangsa, Bekasi, 2019.
[34] Y. Aufar, I. S. Sitanggang, and - Annisa, "Parameter Optimization of Rainfall-runoff Model GR4J using Particle Swarm Optimization on Planting Calendar," International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 6, p. 2575, December 2020, doi: 10.18517/ijaseit.10.6.9110.
[35] H. A. Younis, D. S. Hammadi, and A. N. Younis, "Identify tooth cone beam computed tomography based on contourlet particle swarm optimization," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 1, p. 397, March 2022, doi: 10.11591/ijai.v11.i1.pp397-404.
[36] B. Pang and L. Lee, "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts," ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, vol. 42, pp. 271--278, 2004, [Online]. Available: http://arxiv.org/abs/cs/0409058
[37] A. R. Naufal, R. Satria, and A. Syukur, “Penerapan Bootstrapping untuk Ketidakseimbangan Kelas dan Weighted Information Gain untuk Feature Selection pada Algoritma Support Vector Machine untuk Prediksi Loyalitas Pelanggan,” Journal of Intelligent Systems, vol. 1, no. 2, pp. 98–108, 2015.
[38] G. Xia and W. Jin, "Model of Customer Churn Prediction on Support Vector Machine," Systems Engineering - Theory & Practice, vol. 28, no. 1, pp. 71–77, January 2008, doi: 10.1016/S1874-8651(09)60003-X.
[39] Z.-Y. Chen, Z.-P. Fan, and M. Sun, "A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data," European Journal of Operational Research, vol. 223, no. 2, pp. 461–472, December 2012, doi: 10.1016/j.ejor.2012.06.040.
[40] A. Bisri and R. S. Wahono, “Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree,” Journal of Intelligent Systems, vol. 1, no. 1, pp. 27–32, 2015.
[41] L. D. Utami and R. S. Wahono, “Integrasi Metode Information Gain Untuk Seleksi Fitur dan Adaboost Untuk Mengurangi Bias Pada Analisis Sentimen Review Restoran Menggunakan Algoritma Naïve Bayes,” Journal of Intelligent Systems, vol. 1, no. 2, pp. 120–126, 2015.
[42] A. Rohman, V. Suhartono, and C. Supriyanto, “Penerapan Agoritma C4.5 Berbasis Adaboost Untuk Prediksi Penyakit Jantung,” Jurnal Teknologi Informasi, vol. 13, no. 1, pp. 13–19, 2017.
[43] I.-C. Yeh and C. Lien, "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients," Expert Systems with Applications, vol. 36, no. 2, pp. 2473–2480, March 2009, doi: 10.1016/j.eswa.2007.12.020.
[44] S. Moro, R. M. S. Laureano, and P. Cortez, "Using data mining for bank direct marketing: An application of the CRISP-DM methodology," European Simulation and Modelling Conference 2011, no. 1, pp. 117–121, 2011.
[45] Z. K. S. Domas, “Pengaruh Tekanan, Kesempatan, Rasionalitas, Kompetensi, Arogansi, serta Kolusi terhadap Ketidakbersediaan Transparansi Pengungkapan Anti-korupsi: Analisis Model Heksagon,” Skripsi. Program Studi Diploma IV Akuntansi Politeknik Keuangan Negara STAN, Tangerang Selatan, 2021.
[46] M. Rizkiawan, “Analisis Fraud Hexagon dan Tata Kelola Perusahaan Atas Adanya Kecurangan Dalam Laporan Keuangan,” Skripsi, Program Studi Diploma IV Akuntansi Politeknik Keuangan Negara STAN, 2021.
[47] UCI Machine Learning Repository, "Credit Approval Data Set," 1998. https://archive.ics.uci.edu/ml/datasets/credit+approval
[48] UCI Machine Learning Repository, "South German Credit (UPDATE) Data Set," 2019. https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29
[49] V. Lohweg, "banknote authentication Data Set," UCI Machine Learning Repository, 2012. https://archive.ics.uci.edu/ml/datasets/banknote+authentication
[50] N. Hooda, CSED, TIET, and Patiala, "Audit Data Data Set," UCI Machine Learning Repository, 2018. https://archive.ics.uci.edu/ml/datasets/Audit+Data
[51] R. Kohavi and B. Becker, "Census Income Data Set," UCI Machine Learning Repository, 1994. https://archive.ics.uci.edu/ml/datasets/Census+Income
[52] S. Tomczak, "Polish companies bankruptcy data Data Set," UCI Machine Learning Repository, 2016. https://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data
[53] Adiyanto, “Prediksi Harga Crude Palm Oil Menggunakan Metode Support Vector Machine dengan Optimasi Parameter Menggunakan Algoritma Genetika,” Jurnal IPSIKOM, vol. 1, no. 1, 2013.
[54] D. Kanellopoulos, S. Kotsiantis, and P. Pintelas, "Handling imbalanced datasets: A review Cite this paper Related papers Handling imbalanced datasets: A review," GESTS International Transaction on Computer Science and Engineering, vol. 30, no. 1, pp. 25–36, 2006.
[55] J. S. D. Raharjo, “Model Artificial Neural Network Berbasis Particle Swarm Optimization Untuk Prediksi Laju Inflasi,” Jurnal Sistem Komputer, vol. 3, no. 1, pp. 10–21, 2013.
[56] R. S. Wahono and N. Suryana, "Combining Particle Swarm Optimization based Feature Selection and Bagging Technique for Software Defect Prediction," International Journal of Software Engineering and Its Applications, vol. 7, no. 5, pp. 153–166, September 2013, doi: 10.14257/ijseia.2013.7.5.16.
[57] C. Shabrina, “Metode Hibrida Oversampling Dan Undersampling Untuk Menangani Ketidakseimbangan Data Kegagalan Akademik Pada Universitas XYZ,” Desertasi, Institut Teknologi Sepuluh Nopember, 2019.
[58] F. Itoo, Meenakshi, and S. Singh, "Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection," International Journal of Information Technology, vol. 13, no. 4, pp. 1503–1511, August 2021, doi: 10.1007/s41870-020-00430-y.
[59] F. Gorunescu, Data Mining, 12th ed., vol. 12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. doi: 10.1007/978-3-642-19721-5.
[60] J. Perols, "Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms," Auditing: A Journal of Practice & Theory, vol. 30, no. 2, pp. 19–50, May 2011, doi: 10.2308/ajpt-50009.