Klasifikasi Teks Spam dengan Algoritma Support Vector Machine dan Chi – Square

  • Getzbie Alfredo Tpoy Universitas Udayana
  • Agus Muliantara Universitas Udayana

Abstract

Spam messages are messages that contain false information, commonly regarding events, banking, insurance, bills, advertisements, and viruses. To address the issue of spam, classification can be performed on the received messages. Classification can be done by separating texts that contain spam messages from texts that contain legitimate (ham) messages. In this study, spam text classification was conducted using the Support Vector Machine algorithm, feature selection using Chi-Square. The Chi-Square feature selection method was performed using percentages of 20%, 40%, 60%, and 80%, with accuracy, precision, recall, and F1-Score as the measured values. The result of study obtained was an accuracy of 98.82% with an F1-Score of 93.05% at a feature selection percentage of 60%, using the RBF kernel. Feature selection with percentages of 20%, 40%, and 80% resulted in accuracies of 97.93%, 98.29%, and 98.02%, respectively. These accuracies were better compared to the accuracy without feature selection, which was 97.57%.


Keywords: Chi - Square, spam, support vector machine

Published
2023-08-01
How to Cite
TPOY, Getzbie Alfredo; MULIANTARA, Agus. Klasifikasi Teks Spam dengan Algoritma Support Vector Machine dan Chi – Square. Jurnal Nasional Teknologi Informasi dan Aplikasnya, [S.l.], v. 1, n. 4, p. 1025-1034, aug. 2023. ISSN 3032-1948. Available at: <https://ojs.unud.ac.id/index.php/jnatia/article/view/102465>. Date accessed: 19 nov. 2024.

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.