Deteksi Hate Speech pada Unggahan Media Sosial dengan Naive Bayes Menggunakan Seleksi Fitur Chi-Square
Abstract
In the digital age, social media's pervasive use has revolutionized global communication but also introduced challenges like hate speech. This study proposes a Multinomial Naive Bayes model optimized with Chi-square feature selection to detect hate speech efficiently from large-scale social media data. Leveraging machine learning, this approach aims to combat harmful content by identifying relevant text features crucial for distinguishing hate speech from non-hate speech. The study utilizes TF-IDF for feature extraction and Chi-square for feature selection, showing significant performance improvements in hate speech detection. The Chi-square feature selection model yielded average precision, recall, F1-score, and accuracy values of 92%, 92%, 91%, and 92% respectively. In contrast, the model without feature selection achieved values of 89%, 89%, 88%, and 89% for the same metrics. Results demonstrate enhanced accuracy, precision, recall, and F1-score across various hate speech categories.
Keywords: Hate Speech, Naive Bayes, TF-IDF, Chi-square