Comparison of Gain Ratio and Chi-Square Feature Selection Methods in Improving SVM Performance on IDS

  • Ricky Aurelius Nurtanto Diaz Universitas Udayana
  • I Ketut Gede Darma Putra Information Technology Department, Udayana University
  • Made Sudarma Department of Electrical Engineering, Faculty of Engineering, Udayana University
  • I Made Sukarsa Information Technology Department, Udayana University
  • Naser Jawas School of Engineering, The University of Warwick

Abstract

An intrusion detection system (IDS) is a security technology designed to identify and monitor suspicious activity in a computer network or system and detect potential attacks or security breaches. The importance of accuracy in IDS must be addressed, given that the response to any alert or activity generated by the system must be precise and measurable. However, achieving high accuracy in IDS requires a process that takes work. The complex network environment and the diversity of attacks led to significant challenges in developing IDS. The application of algorithms and optimization techniques needs to be considered to improve the accuracy of IDS. Support vector machine (SVM) is one data mining method with a high accuracy level in classifying network data packet patterns. A feature selection stage is needed for an optimal classification process, which can also be applied to SVM. Feature selection is an essential step in the data preprocessing phase; optimization of data input can improve the performance of the SVM algorithm, so this study compares the performance between feature selection algorithms, namely Information Gain Ratio and Chi-Square, and then classifies IDS data using the SVM algorithm. This outcome implies the importance of selecting the right features to develop an effective IDS.

Downloads

Download data is not yet available.

References

[1] L. Yang and A. Shami, “IDS-ML: An open source code for Intrusion Detection System development using Machine Learning[Formula presented],” Software Impacts, vol. 14, Nov. 2022, doi: 10.1016/j.simpa.2022.100446.
[2] M. A. Hossain and M. S. Islam, “Ensuring network security with a robust intrusion detection system using ensemble-based machine learning,” Array, p. 100306, Sep. 2023, doi: 10.1016/j.array.2023.100306.
[3] Z. Yang et al., “A systematic literature review of methods and datasets for anomaly-based network intrusion detection,” Computers and Security, vol. 116. Elsevier Ltd, May 01, 2022. doi: 10.1016/j.cose.2022.102675.
[4] B. M. Serinelli, A. Collen, and N. A. Nijdam, "On the analysis of open source datasets: Validating IDS implementation for well-known and zero-day attack detection," in Procedia Computer Science, Elsevier B.V., 2021, pp. 192–199. doi: 10.1016/j.procs.2021.07.024.
[5] N. Kunhare, R. Tiwari, and J. Dhar, "Particle swarm optimization and feature selection for an intrusion detection system," Sādhanā, vol. 45, 2020, doi: 10.1007/s12046-020-1308-5S.
[6] R. Alshamy, M. Ghurab, S. Othman, and F. Alshami, “Intrusion Detection Model for Imbalanced Dataset Using SMOTE and Random Forest Algorithm,” in Communications in Computer and Information Science, Springer Science and Business Media Deutschland GmbH, 2021, pp. 361–378. doi: 10.1007/978-981-16-8059-5_22.
[7] D. Musleh, M. Alotaibi, F. Alhaidari, A. Rahman, and R. M. Mohammad, “Intrusion Detection System Using Feature Extraction with Machine Learning Algorithms in IoT,” Journal of Sensor and Actuator Networks, vol. 12, no. 2, Apr. 2023, doi: 10.3390/jsan12020029.
[8] D. N. Avianty, Prof. I. G. P. S. Wijaya, and F. Bimantoro, “The Comparison of SVM and ANN Classifier for COVID-19 Prediction,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 13, no. 2, p. 128, Aug. 2022, doi: 10.24843/lkjiti.2022.v13.i02.p06.
[9] D. A. Anggoro, “Comparison of Accuracy Level of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) Algorithms in Predicting Heart Disease,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 5, pp. 1689–1694, May 2020, doi: 10.30534/ijeter/2020/32852020.
[10] J. Gu and S. Lu, “An effective intrusion detection approach using SVM with naïve Bayes feature embedding,” Comput Secur, vol. 103, Apr. 2021, doi: 10.1016/j.cose.2020.102158.
[11] Y. K. Saheed and F. E. Hamza-Usman, “Feature Selection with IG-R for Improving Performance of Intrusion Detection System,” 2020.
[12] T. Ahmad and M. N. Aziz, “Data preprocessing and feature selection for machine learning intrusion detection systems,” ICIC Express Letters, vol. 13, no. 2, pp. 93–101, 2019, doi: 10.24507/icicel.13.02.93.
[13] T. S. Naseri and F. S. Gharehchopogh, “A Feature Selection Based on the Farmland Fertility Algorithm for Improved Intrusion Detection Systems,” Journal of Network and Systems Management, vol. 30, no. 3, Jul. 2022, doi: 10.1007/s10922-022-09653-9.
[14] A. F. Indriani and M. A. Muslim, “SVM Optimization Based on PSO and AdaBoost to Increasing Accuracy of CKD Diagnosis,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, p. 119, Aug. 2019, doi: 10.24843/lkjiti.2019.v10.i02.p06.
[15] S. J. Pasha and E. S. Mohamed, “Advanced hybrid ensemble gain ratio feature selection model using machine learning for enhanced disease risk prediction,” Informatics in Medicine Unlocked, vol. 32, Jan. 2022, doi: 10.1016/j.imu.2022.101064.
[16] N. D. Cilia, C. De Stefano, F. Fontanella, S. Raimondo, and A. S. di Freca, “An experimental comparison of feature-selection and classification methods for microarray datasets,” Information (Switzerland), vol. 10, no. 3, 2019, doi: 10.3390/info10030109.
[17] C. J. Zhang, X. Y. Huang, and M. C. Fang, “MRI denoising by NeighShrink based on chi-square unbiased risk estimation,” Artificial Intelligence in Medicine, vol. 97, pp. 131–142, Jun. 2019, doi: 10.1016/j.artmed.2018.12.001.
[18] S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature selection using an improved Chi-square for Arabic text classification,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, Feb. 2020, doi: 10.1016/j.jksuci.2018.05.010.
[19] J. H. Joloudari, H. Saadatfar, A. Dehzangi, and S. Shamshirband, “Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection,” Informatics in Medicine Unlocked, vol. 17, Jan. 2019, doi: 10.1016/j.imu.2019.100255.
[20] C. Ioannou, V. Vassiliou, and by Ieee, “Network Attack Classification in IoT Using Support Vector Machines,” 2021, doi: 10.3390/jsan.
[21] S. İlkin, T. H. Gençtürk, F. Kaya Gülağız, H. Özcan, M. A. Altuncu, and S. Şahin, “hybSVM: Bacterial colony optimization algorithm based SVM for malignant melanoma detection,” Engineering Science and Technology, an International Journal, vol. 24, no. 5, pp. 1059–1071, Oct. 2021, doi: 10.1016/j.jestch.2021.02.002.
[22] P. Nimbalkar and D. Kshirsagar, “Feature selection for intrusion detection system in Internet-of-Things (IoT),” ICT Express, vol. 7, no. 2, pp. 177–181, Jun. 2021, doi: 10.1016/j.icte.2021.04.012.
Published
2024-03-29
How to Cite
DIAZ, Ricky Aurelius Nurtanto et al. Comparison of Gain Ratio and Chi-Square Feature Selection Methods in Improving SVM Performance on IDS. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, [S.l.], v. 15, n. 1, p. 64-74, mar. 2024. ISSN 2541-5832. Available at: <https://ojs.unud.ac.id/index.php/lontar/article/view/108828>. Date accessed: 13 nov. 2024. doi: https://doi.org/10.24843/LKJITI.2024.v15.i01.p06.