Klasifikasi Teks Spam dengan Algoritma Support Vector Machine dan Chi – Square
Abstract
Spam messages are messages that contain false information, commonly regarding events, banking, insurance, bills, advertisements, and viruses. To address the issue of spam, classification can be performed on the received messages. Classification can be done by separating texts that contain spam messages from texts that contain legitimate (ham) messages. In this study, spam text classification was conducted using the Support Vector Machine algorithm, feature selection using Chi-Square. The Chi-Square feature selection method was performed using percentages of 20%, 40%, 60%, and 80%, with accuracy, precision, recall, and F1-Score as the measured values. The result of study obtained was an accuracy of 98.82% with an F1-Score of 93.05% at a feature selection percentage of 60%, using the RBF kernel. Feature selection with percentages of 20%, 40%, and 80% resulted in accuracies of 97.93%, 98.29%, and 98.02%, respectively. These accuracies were better compared to the accuracy without feature selection, which was 97.57%.
Keywords: Chi - Square, spam, support vector machine
This work is licensed under a Creative Commons Attribution 4.0 International License.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya) as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya). The Editorial Board of JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya) makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.