The Use of XGBoost Algorithm to Analyse the Severity of Traffic Accident Victims

  • I Made Sukarsa Udayana Universty
  • Ni Kadek Dwi Rusjayanthi
  • Made Srinitha Millinia Utami Department of Information Technology, Udayana University
  • Ni Wayan Wisswani Department of Information System Management, Bali State Polytechnic

Abstract

Traffic accidents are still significant contributors to a fairly high death. Denpasar’s resort police record every traffic accident in the form of a daily report. The stored data can generate valuable information to improve policies and propagate better traffic practices. This research utilizes the classification technique with the XGBoost, random forest algorithm, and SMOTE method. The study shows that the SMOTE technique can increase the model's accuracy. Using the classification method with the two algorithms produces factors that affect the severity of traffic accident victims with feature importance. The feature importance obtained using the XGBoost model by counting the weight value for testing using the original dataset, the dataset for the type of two-wheeled vehicle, and the dataset of the kind of vehicle other than two-wheeled indicate that the variables influencing the severity of victims in road accidents are the time of accident between 00.00-06.00, the type of vehicle motorcycle, the type of opponent vehicle truck and pickup car, the age of the driver between 16-25, sub-district road status and front – side type of accident.

Downloads

Download data is not yet available.

References

[1] I. F. Anshori and Y. Nuraini, “Pengelompokan Data Kecelakaan Lalu Lintas di Kota Tasikmalaya Menggunakan Algoritma K-Means,” Jurnal Responsif: Riset Sains dan Informatika, vol. 2, no. 1, pp. 118–127, 2020, doi: 10.51977/jti.v2i1.198.
[2] Marroli, “Rata-rata Tiga Orang Meninggal Setiap Jam Akibat Kecelakaan Jalan.” https://kominfo.go.id/index.php/content/detail/10368/rata-rata-tiga-orang-meninggal-setiap-jam-akibat-kecelakaan-jalan/0/artikel_gpr (accessed Mar. 08, 2022).
[3] J. Yang et al., “Brief introduction of medical database and data mining technology in big data era,” J Evid Based Med, vol. 13, no. 1, pp. 57–69, 2020, doi: 10.1111/jebm.12373.
[4] R. R. Asaad and R. M. Abdulhakim, “The Concept of Data Mining and Knowledge Extraction Techniques,” Qubahan Academic Journal, vol. 1, no. 2, pp. 17–20, 2021, doi: 10.48161/qaj.v1n2a43.
[5] A. O. Adebayo and M. S. Chaubey, “Data Mining Classification Techniques on the Analysis of Student’s Performance,” Global Scientific Journals, vol. 7, no. 4, pp. 79–95, 2019, [Online]. Available: www.globalscientificjournal.com.
[6] J. Brownlee, XGBoost With Python. 2018.
[7] C. Zhang et al., “Cause-aware failure detection using an interpretable XGBoost for optical networks,” Optics Express, vol. 29, no. 20, p. 31974, 2021, doi: 10.1364/oe.436293.
[8] P. Song and Y. Liu, “An xgboost algorithm for predicting purchasing behaviour on e-commerce platforms,” Technical Gazette, vol. 27, no. 5, pp. 1467–1471, 2020, doi: 10.17559/TV-20200808113807.
[9] B. Noh, W. No, J. Lee, and D. Lee, “Vision-Based Potential Pedestrian Risk Analysis on Unsignalized Crosswalk Using Data Mining Techniques,” Applied Sciences, vol. 10, no. 3, 2020, doi: 10.3390/app10031057.
[10] I. M. Sukarsa, N. N. Pandika Pinata, N. Kadek Dwi Rusjayanthi, and N. W. Wisswani, “Estimation of Gourami Supplies Using Gradient Boosting Decision Tree Method of XGBoost,” TEM Journal, vol. 10, no. 1, pp. 144–151, 2021, doi: 10.18421/TEM101-17.
[11] S. S. Yassin and Pooja, “Road accident prediction and model interpretation using a hybrid K-means and random forest algorithm approach,” SN Applied Sciences., vol. 2, no. 9, pp. 1–13, 2020, doi: 10.1007/s42452-020-3125-1.
[12] A. Comi, A. Polimeni, and C. Balsamo, “Road Accident Analysis with Data Mining Approach: evidence from Rome,” Transportation Research Procedia, vol. 62, no. Ewgt 2021, pp. 798–805, 2022, doi: 10.1016/j.trpro.2022.02.099.
[13] Y. Zhao and W. Deng, “Prediction in Traffic Accident Duration Based on Heterogeneous Ensemble Learning,” Applied Artificial Intelligence., vol. 00, no. 00, pp. 1–24, 2022, doi: 10.1080/08839514.2021.2018643.
[14] A. Irfan, R. Al Rasyid, and S. Handayani, “Data mining applied for accident prediction model in Indonesia toll road,” AIP Conference Proceedings, vol. 1977, no. June 2018, 2018, doi: 10.1063/1.5043013.
[15] A. Jamal, A. Handayani, A. A. Septiandri, E. Ripmiatin, and Y. Effendi, “Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction,” Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol. 9, no. 3, p. 192, 2018, doi: 10.24843/lkjiti.2018.v09.i03.p08.
[16] Y. Jiang, G. Tong, H. Yin, and N. Xiong, “A Pedestrian Detection Method Based on Genetic Algorithm for Optimize XGBoost Training Parameters,” IEEE Access, vol. 7, pp. 118310–118321, 2019, doi: 10.1109/ACCESS.2019.2936454.
[17] C. Wang, C. Deng, and S. Wang, “Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost,” Pattern Recognition Letters, vol. 136, pp. 190–197, 2020, doi: 10.1016/j.patrec.2020.05.035.
[18] J. Wang, J. Xu, C. Zhao, Y. Peng, and H. Wang, “An ensemble feature selection method for high-dimensional data based on sort aggregation,” Systems Science & Control Engineering., vol. 7, no. 2, pp. 32–39, 2019, doi: 10.1080/21642583.2019.1620658.
[19] J. Poulos and R. Valle, “Missing Data Imputation for Supervised Learning,” Applied Artificial Intelligence., vol. 32, no. 2, pp. 186–196, 2018, doi: 10.1080/08839514.2018.1448143.
[20] J. T. Hancock and T. M. Khoshgoftaar, “Survey on categorical data for neural networks,” Journal of Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00305-w.
[21] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
[22] K. Liu, Z. Dai, R. Zhang, J. Zheng, J. Zhu, and X. Yang, “Prediction of the sulfate resistance for recycled aggregate concrete based on ensemble learning algorithms,” Construction and Building Materials, vol. 317, no. November 2021, p. 125917, 2022, doi: 10.1016/j.conbuildmat.2021.125917.
[23] S. Maldonado, C. Vairetti, A. Fernandez, and F. Herrera, “FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification,” Pattern Recognition., vol. 124, 2022, doi: 10.1016/j.patcog.2021.108511.
[24] Y. Lu et al., “The application of improved random forest algorithm on the prediction of electric vehicle charging load,” Energies, vol. 11, no. 11, 2018, doi: 10.3390/en11113207.
[25] P. P. dan P. J. P. P. dan P. I. Wilayah, “Data Kecelakaan Lalu Lintas Tahun 2016,” vol. 53, no. 9, 2016.
Published
2023-10-27
How to Cite
SUKARSA, I Made et al. The Use of XGBoost Algorithm to Analyse the Severity of Traffic Accident Victims. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, [S.l.], v. 14, n. 1, p. 36-47, oct. 2023. ISSN 2541-5832. Available at: <https://ojs.unud.ac.id/index.php/lontar/article/view/106835>. Date accessed: 03 jan. 2025. doi: https://doi.org/10.24843/LKJITI.2023.v14.i01.p04.