The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification
Abstract
Information, like public opinions or responses, can be obtained through Twitter tweets. These opinions can expressed as a sentiment. Sentiments can be positive, neutral, or negative. Sentiment analysis (opinion mining) on a text can performed through text classification. This research aims to determine the influence of implementing Stopword Removal and SMOTE on the sentiment classification model for Indonesian tweets. The algorithms used in this research are Logistic Regression and Random Forest. Based on the evaluation, the best classification model in this research was achieved by implementing the Random Forest algorithm along with SMOTE, with an f1-score value of 75.03%. Meanwhile, implementing the Random Forest algorithm and Stopword Removal achieved the worst classification model, with an f1-score value of 68.09%. Implementing Stopword Removal in both algorithms has a negative impact in the form of a decrease in the resulting f1-score. Meanwhile, the performance of SMOTE provides a positive impact in the form of an increase in the resulting f1-score. This happened since Stopword Removal could reduce information and alter the meaning of processed tweets, causing the tweet to lose its sentiment.
Downloads
References
[2] I. Firmansyah, J. T. Samudra, D. Pardede, and Z. Situmorang, “Komparasi random forest dan logistic regression dalam klasifikasi penderita covid-19 berdasarkan gejalanya,” Journal of Science and Social Research, vol. 5, no. 3, p. 595, 2022, doi: 10.54314/jssr.v5i3.994.
[3] A. B. P. Negara, H. Muhardi, and F. Sajid, “Perbandingan Algoritma Klasifikasi terhadap Emosi Tweet Berbahasa Indonesia,” Jurnal Edukasi dan Penelitian Informatika, vol. 7, no. 2, p. 242, 2021, doi: 10.26418/jp.v7i2.48198.
[4] M. Noveanto, H. Sastypratiwi, H. Muhardi, and J. H. Hadari Nawawi Pontianak, “Uji akurasi klasifikasi emosi pada lirik lagu bahasa indonesia,” Jurnal Sistem dan Teknologi Informasi, vol. 10, no. 3, pp. 311–318, 2022, doi: 10.26418/justin.v10i3.56804.
[5] C. Cahyaningtyas, Y. Nataliani, and I. R. Widiasari, “Analisis sentimen pada rating aplikasi shopee menggunakan metode decision tree berbasis smote,” AITI : Jurnal Teknologi Informasi, vol. 18, no. 2, pp. 173–184, 2021, doi: 10.24246/aiti.v18i2.173-184.
[6] A. Andreyestha and Q. N. Azizah, “Analisa sentimen kicauan twitter tokopedia dengan optimalisasi data tidak seimbang menggunakan algoritma smote,” Infotek : Jurnal Informatika dan Teknologi, vol. 5, no. 1, pp. 108–116, 2022, doi: 10.29408/jit.v5i1.4581.
[7] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset indonesia untuk analisis sentimen,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 8, no. 4, p. 334, 2019, doi: 10.22146/jnteti.v8i4.533.
[8] Y. Sari, “Pengenalan Natural Language Toolkit (NLTK),” Yogyakarta, 2019.
[9] I. F. Rozi, R. Ardiansyah, and N. Rebeka, “Penerapan Normalisasi Kata Tidak Baku Menggunakan Levenshtein Distance pada Analisa Sentimen Layanan PT . KAI di Twitter,” Seminar Informatika Aplikatif, pp. 106–112, 2019.
[10] M. S. Anwar, I. M. I. Subroto, and S. Mulyono, “Sistem pencarian e-journal menggunakan metode stopword removal dan stemming,” Prosiding KONFERENSI ILMIAH MAHASISWA UNISSULA (KIMU) 2, pp. 58–70, 2019, [Online]. Available: https://jurnal.unissula.ac.id/index.php/kimueng/article/view/8420
[11] M. Darwis, G. T. Pranoto, Y. E. Wicaksana, and Y. Yaddarabullah, “Implementation of TF-IDF Algorithm and K-mean Clustering Method to Predict Words or Topics on Twitter,” Jurnal Informatika dan Sains, vol. 3, no. 2, pp. 49–55, 2020, doi: 10.31326/jisa.v3i2.831.
[12] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh text preprocessing terhadap analisis sentimen komentar masyarakat pada media sosial twitter,” Jurnal Media Informatika Budidarma, vol. 5, no. 2, pp. 406–414, 2021, doi: 10.30865/mib.v5i2.2835.
[13] E. M. O. N. Haryanto, A. K. A. Estetikha, and R. A. Setiawan, “Implementasi smote untuk mengatasi imbalanced data pada sentimen analisis sentimen hotel di nusa tenggara barat dengan menggunakan algoritma svm,” Jurnal Informasi Interaktif, vol. 7, no. 1, p. 16, 2022.
[14] M. Azhar and H. F. Pardede, “Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest,” Jurnal Media Informatika Budidarma, vol. 5, no. 2, pp. 439–446, 2021, doi: 10.30865/mib.v5i2.2754.
[15] Ramli, D. Yuniarti, and R. Goejantoro, “Perbandingan metode klasifikasi regresi logistik dengan jaringan saraf tiruan,” Jurnal Eksponensial, vol. 4, no. 1, pp. 17–24, 2013.
[16] A. Ferdita Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode stacking dan random forest untuk meningkatkan kinerja klasifikasi pada proses deteksi web phishing,” Jurnal Infomedia: Teknik Informatika, Multimedia & Jaringan, vol. 7, no. 1, pp. 39–44, 2022, doi: 10.30811/jim.v7i1.2959.
[17] H. Nalatissifa, W. Gata, S. Diantika, and K. Nisa, “Perbandingan kinerja algoritma klasifikasi naive bayes, support vector machine (svm), dan random forest untuk prediksi ketidakhadiran di tempat kerja,” Jurnal Informatika Universitas Pamulang, vol. 5, no. 4, pp. 578–584, 2021, doi: 10.32493/informatika.v5i4.7575.
[18] S. Khomsah and Agus Sasmito Aribowo, “Model text-preprocessing komentar youtube dalam bahasa indonesia,” Rekayasa Sistem dan Teknologi Informasi, vol. 4, no. 10, pp. 648–654, 2020, doi: 10.29207/resti.v4i4.2035.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to Jurnal Lontar Komputer as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from Jurnal Lontar Komputer. The Editorial Board of Jurnal Lontar Komputer makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.