Comparison of Gain Ratio and Chi-Square Feature Selection Methods in Improving SVM Performance on IDS
Abstract
An intrusion detection system (IDS) is a security technology designed to identify and monitor suspicious activity in a computer network or system and detect potential attacks or security breaches. The importance of accuracy in IDS must be addressed, given that the response to any alert or activity generated by the system must be precise and measurable. However, achieving high accuracy in IDS requires a process that takes work. The complex network environment and the diversity of attacks led to significant challenges in developing IDS. The application of algorithms and optimization techniques needs to be considered to improve the accuracy of IDS. Support vector machine (SVM) is one data mining method with a high accuracy level in classifying network data packet patterns. A feature selection stage is needed for an optimal classification process, which can also be applied to SVM. Feature selection is an essential step in the data preprocessing phase; optimization of data input can improve the performance of the SVM algorithm, so this study compares the performance between feature selection algorithms, namely Information Gain Ratio and Chi-Square, and then classifies IDS data using the SVM algorithm. This outcome implies the importance of selecting the right features to develop an effective IDS.
Downloads
References
[2] M. A. Hossain and M. S. Islam, “Ensuring network security with a robust intrusion detection system using ensemble-based machine learning,” Array, p. 100306, Sep. 2023, doi: 10.1016/j.array.2023.100306.
[3] Z. Yang et al., “A systematic literature review of methods and datasets for anomaly-based network intrusion detection,” Computers and Security, vol. 116. Elsevier Ltd, May 01, 2022. doi: 10.1016/j.cose.2022.102675.
[4] B. M. Serinelli, A. Collen, and N. A. Nijdam, "On the analysis of open source datasets: Validating IDS implementation for well-known and zero-day attack detection," in Procedia Computer Science, Elsevier B.V., 2021, pp. 192–199. doi: 10.1016/j.procs.2021.07.024.
[5] N. Kunhare, R. Tiwari, and J. Dhar, "Particle swarm optimization and feature selection for an intrusion detection system," Sādhanā, vol. 45, 2020, doi: 10.1007/s12046-020-1308-5S.
[6] R. Alshamy, M. Ghurab, S. Othman, and F. Alshami, “Intrusion Detection Model for Imbalanced Dataset Using SMOTE and Random Forest Algorithm,” in Communications in Computer and Information Science, Springer Science and Business Media Deutschland GmbH, 2021, pp. 361–378. doi: 10.1007/978-981-16-8059-5_22.
[7] D. Musleh, M. Alotaibi, F. Alhaidari, A. Rahman, and R. M. Mohammad, “Intrusion Detection System Using Feature Extraction with Machine Learning Algorithms in IoT,” Journal of Sensor and Actuator Networks, vol. 12, no. 2, Apr. 2023, doi: 10.3390/jsan12020029.
[8] D. N. Avianty, Prof. I. G. P. S. Wijaya, and F. Bimantoro, “The Comparison of SVM and ANN Classifier for COVID-19 Prediction,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 13, no. 2, p. 128, Aug. 2022, doi: 10.24843/lkjiti.2022.v13.i02.p06.
[9] D. A. Anggoro, “Comparison of Accuracy Level of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) Algorithms in Predicting Heart Disease,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 5, pp. 1689–1694, May 2020, doi: 10.30534/ijeter/2020/32852020.
[10] J. Gu and S. Lu, “An effective intrusion detection approach using SVM with naïve Bayes feature embedding,” Comput Secur, vol. 103, Apr. 2021, doi: 10.1016/j.cose.2020.102158.
[11] Y. K. Saheed and F. E. Hamza-Usman, “Feature Selection with IG-R for Improving Performance of Intrusion Detection System,” 2020.
[12] T. Ahmad and M. N. Aziz, “Data preprocessing and feature selection for machine learning intrusion detection systems,” ICIC Express Letters, vol. 13, no. 2, pp. 93–101, 2019, doi: 10.24507/icicel.13.02.93.
[13] T. S. Naseri and F. S. Gharehchopogh, “A Feature Selection Based on the Farmland Fertility Algorithm for Improved Intrusion Detection Systems,” Journal of Network and Systems Management, vol. 30, no. 3, Jul. 2022, doi: 10.1007/s10922-022-09653-9.
[14] A. F. Indriani and M. A. Muslim, “SVM Optimization Based on PSO and AdaBoost to Increasing Accuracy of CKD Diagnosis,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, p. 119, Aug. 2019, doi: 10.24843/lkjiti.2019.v10.i02.p06.
[15] S. J. Pasha and E. S. Mohamed, “Advanced hybrid ensemble gain ratio feature selection model using machine learning for enhanced disease risk prediction,” Informatics in Medicine Unlocked, vol. 32, Jan. 2022, doi: 10.1016/j.imu.2022.101064.
[16] N. D. Cilia, C. De Stefano, F. Fontanella, S. Raimondo, and A. S. di Freca, “An experimental comparison of feature-selection and classification methods for microarray datasets,” Information (Switzerland), vol. 10, no. 3, 2019, doi: 10.3390/info10030109.
[17] C. J. Zhang, X. Y. Huang, and M. C. Fang, “MRI denoising by NeighShrink based on chi-square unbiased risk estimation,” Artificial Intelligence in Medicine, vol. 97, pp. 131–142, Jun. 2019, doi: 10.1016/j.artmed.2018.12.001.
[18] S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature selection using an improved Chi-square for Arabic text classification,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, Feb. 2020, doi: 10.1016/j.jksuci.2018.05.010.
[19] J. H. Joloudari, H. Saadatfar, A. Dehzangi, and S. Shamshirband, “Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection,” Informatics in Medicine Unlocked, vol. 17, Jan. 2019, doi: 10.1016/j.imu.2019.100255.
[20] C. Ioannou, V. Vassiliou, and by Ieee, “Network Attack Classification in IoT Using Support Vector Machines,” 2021, doi: 10.3390/jsan.
[21] S. İlkin, T. H. Gençtürk, F. Kaya Gülağız, H. Özcan, M. A. Altuncu, and S. Şahin, “hybSVM: Bacterial colony optimization algorithm based SVM for malignant melanoma detection,” Engineering Science and Technology, an International Journal, vol. 24, no. 5, pp. 1059–1071, Oct. 2021, doi: 10.1016/j.jestch.2021.02.002.
[22] P. Nimbalkar and D. Kshirsagar, “Feature selection for intrusion detection system in Internet-of-Things (IoT),” ICT Express, vol. 7, no. 2, pp. 177–181, Jun. 2021, doi: 10.1016/j.icte.2021.04.012.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to Jurnal Lontar Komputer as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from Jurnal Lontar Komputer. The Editorial Board of Jurnal Lontar Komputer makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.