The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification

Arif Bijaksana Putra Negara

doi:10.24843/LKJITI.2023.v14.i03.p05

Arif Bijaksana Putra Negara Universitas Tanjungpura

DOI: https://doi.org/10.24843/LKJITI.2023.v14.i03.p05

Abstract

Information, like public opinions or responses, can be obtained through Twitter tweets. These opinions can expressed as a sentiment. Sentiments can be positive, neutral, or negative. Sentiment analysis (opinion mining) on a text can performed through text classification. This research aims to determine the influence of implementing Stopword Removal and SMOTE on the sentiment classification model for Indonesian tweets. The algorithms used in this research are Logistic Regression and Random Forest. Based on the evaluation, the best classification model in this research was achieved by implementing the Random Forest algorithm along with SMOTE, with an f1-score value of 75.03%. Meanwhile, implementing the Random Forest algorithm and Stopword Removal achieved the worst classification model, with an f1-score value of 68.09%. Implementing Stopword Removal in both algorithms has a negative impact in the form of a decrease in the resulting f1-score. Meanwhile, the performance of SMOTE provides a positive impact in the form of an increase in the resulting f1-score. This happened since Stopword Removal could reduce information and alter the meaning of processed tweets, causing the tweet to lose its sentiment.

Downloads

Download data is not yet available.

References

[1] A. Santosa, I. Purnamasari, and R. Mayasari, “Pengaruh stopword removal dan stemming terhadap performa klasifikasi teks komentar kebijakan new normal menggunakan algoritma,” Jurnal Sains Komputer & Informatika, vol. 6, no. 1, pp. 81–93, 2022, doi: 10.30645/j-sakti.v6i1.427.
[2] I. Firmansyah, J. T. Samudra, D. Pardede, and Z. Situmorang, “Komparasi random forest dan logistic regression dalam klasifikasi penderita covid-19 berdasarkan gejalanya,” Journal of Science and Social Research, vol. 5, no. 3, p. 595, 2022, doi: 10.54314/jssr.v5i3.994.
[3] A. B. P. Negara, H. Muhardi, and F. Sajid, “Perbandingan Algoritma Klasifikasi terhadap Emosi Tweet Berbahasa Indonesia,” Jurnal Edukasi dan Penelitian Informatika, vol. 7, no. 2, p. 242, 2021, doi: 10.26418/jp.v7i2.48198.
[4] M. Noveanto, H. Sastypratiwi, H. Muhardi, and J. H. Hadari Nawawi Pontianak, “Uji akurasi klasifikasi emosi pada lirik lagu bahasa indonesia,” Jurnal Sistem dan Teknologi Informasi, vol. 10, no. 3, pp. 311–318, 2022, doi: 10.26418/justin.v10i3.56804.
[5] C. Cahyaningtyas, Y. Nataliani, and I. R. Widiasari, “Analisis sentimen pada rating aplikasi shopee menggunakan metode decision tree berbasis smote,” AITI : Jurnal Teknologi Informasi, vol. 18, no. 2, pp. 173–184, 2021, doi: 10.24246/aiti.v18i2.173-184.
[6] A. Andreyestha and Q. N. Azizah, “Analisa sentimen kicauan twitter tokopedia dengan optimalisasi data tidak seimbang menggunakan algoritma smote,” Infotek : Jurnal Informatika dan Teknologi, vol. 5, no. 1, pp. 108–116, 2022, doi: 10.29408/jit.v5i1.4581.
[7] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset indonesia untuk analisis sentimen,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 8, no. 4, p. 334, 2019, doi: 10.22146/jnteti.v8i4.533.
[8] Y. Sari, “Pengenalan Natural Language Toolkit (NLTK),” Yogyakarta, 2019.
[9] I. F. Rozi, R. Ardiansyah, and N. Rebeka, “Penerapan Normalisasi Kata Tidak Baku Menggunakan Levenshtein Distance pada Analisa Sentimen Layanan PT . KAI di Twitter,” Seminar Informatika Aplikatif, pp. 106–112, 2019.
[10] M. S. Anwar, I. M. I. Subroto, and S. Mulyono, “Sistem pencarian e-journal menggunakan metode stopword removal dan stemming,” Prosiding KONFERENSI ILMIAH MAHASISWA UNISSULA (KIMU) 2, pp. 58–70, 2019, [Online]. Available: https://jurnal.unissula.ac.id/index.php/kimueng/article/view/8420
[11] M. Darwis, G. T. Pranoto, Y. E. Wicaksana, and Y. Yaddarabullah, “Implementation of TF-IDF Algorithm and K-mean Clustering Method to Predict Words or Topics on Twitter,” Jurnal Informatika dan Sains, vol. 3, no. 2, pp. 49–55, 2020, doi: 10.31326/jisa.v3i2.831.
[12] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh text preprocessing terhadap analisis sentimen komentar masyarakat pada media sosial twitter,” Jurnal Media Informatika Budidarma, vol. 5, no. 2, pp. 406–414, 2021, doi: 10.30865/mib.v5i2.2835.
[13] E. M. O. N. Haryanto, A. K. A. Estetikha, and R. A. Setiawan, “Implementasi smote untuk mengatasi imbalanced data pada sentimen analisis sentimen hotel di nusa tenggara barat dengan menggunakan algoritma svm,” Jurnal Informasi Interaktif, vol. 7, no. 1, p. 16, 2022.
[14] M. Azhar and H. F. Pardede, “Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest,” Jurnal Media Informatika Budidarma, vol. 5, no. 2, pp. 439–446, 2021, doi: 10.30865/mib.v5i2.2754.
[15] Ramli, D. Yuniarti, and R. Goejantoro, “Perbandingan metode klasifikasi regresi logistik dengan jaringan saraf tiruan,” Jurnal Eksponensial, vol. 4, no. 1, pp. 17–24, 2013.
[16] A. Ferdita Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode stacking dan random forest untuk meningkatkan kinerja klasifikasi pada proses deteksi web phishing,” Jurnal Infomedia: Teknik Informatika, Multimedia & Jaringan, vol. 7, no. 1, pp. 39–44, 2022, doi: 10.30811/jim.v7i1.2959.
[17] H. Nalatissifa, W. Gata, S. Diantika, and K. Nisa, “Perbandingan kinerja algoritma klasifikasi naive bayes, support vector machine (svm), dan random forest untuk prediksi ketidakhadiran di tempat kerja,” Jurnal Informatika Universitas Pamulang, vol. 5, no. 4, pp. 578–584, 2021, doi: 10.32493/informatika.v5i4.7575.
[18] S. Khomsah and Agus Sasmito Aribowo, “Model text-preprocessing komentar youtube dalam bahasa indonesia,” Rekayasa Sistem dan Teknologi Informasi, vol. 4, no. 10, pp. 648–654, 2020, doi: 10.29207/resti.v4i4.2035.