The BERT Uncased and LSTM Multiclass Classification Model for Traffic Violation Text Classification

  • Komang Ayu Triana Indah Politeknik Negeri Bali
  • I Ketut Gede Darma Putra nformation Technology Department Udayana University
  • I Made Sudarma Information Technology Department Udayana University
  • Rukmi Sari Hartati Electrical Engineering Department Udayana University
  • Minho Jo Department of Computer and Information Science, Korea University

Abstract

The increasing amount of internet content makes it difficult for users to find information using the search function. This problem is overcome by classifying news based on its context to avoid material that has many interpretations. This research combines the Uncased model BiDirectional Encoder Representations from Transformer (BERT) with other models to create a text classification model. Long Short-Term Memory (LSTM) architecture trains a model to categorize news articles about traffic violations. Data was collected through the crawling method from the online media application API through unmodified and modified datasets. The BERT Uncased-LSTM model with the best hyperparameter combination scenario of batch size 16, learning rate 2e-5, and average pooling obtained Precision, Recall, and F1 values of 97.25%, 96.90%, and 98.10%, respectively. The research results show that the test value on the unmodified dataset is higher than on the modified dataset because the selection of words that have high information value in the modified dataset makes it difficult for the model to understand the context in text classification.

Downloads

Download data is not yet available.

References

[1] V. Singh , V. Unadkat , and P. Kanani , “Intelligent traffic management system” International Journal of Recent Technology and Engineering, vol. 8, no. 3, pp. 7592–7597, 2019, doi: 10.35940/ijrte.C6168.098319.
[2] S. Bai et al., “Natural language guided visual relationship detection” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp . 444–453, 2019, doi 10.1109/CVPRW.2019.00058.
[3] D. Guna Mandhasiya , H. Murfi, A. Bustamam , P. Anki , and H. Standard RIS Vancouver Mandasiya , “Evaluation of Machine Performance Based Learning on BERT Data Representation with LSTM Model to Conduct Sentiment Analysis in Indonesian for Predicting Voices of Social Media Users in the 2024 Indonesia Presidential Election” in 2022 5th International Conference on Information and Communications Technology (ICOIACT), pp. 441–446, 2022, doi: 10.1109/ICOIACT55506.2022.9972206.
[4] J. Li, P. Yao , L. Guo , and W. Zhang , “Boosted transformer for image captioning” Applied Sciences (Switzerland), vol. 9, no. 16, pp . 1–15, 2019, doi: 10.3390/app9163260.
[5] S. Ren, K. He, R. Girshick , and J. Sun, “Faster R-CNN: Towards Real-Time Objects Detection with Region Proposal Networks” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp . 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
[6] AK Sharma, S. Chaurasia , and DK Srivastava , “Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec” in Procedia Computers Science , Elsevier B.V., 2020, pp . 1139–1147. doi: 10.1016/j.procs.2020.03.416.
[7] QT Nguyen , TL Nguyen , NH Luong , and QH Ngo , “Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews” in 2020 7th NAFOSTED Conference on Information and Computer Science (NICS), Nov. 2020, doi: 10.1109/NICS51282.2020.9335899, [Online]. Available : http://arxiv.org/abs/2011.10426
[8] Q. Yu , Z. Wang, and K. Jiang , “Research on Text Classification Based on BERT- BiGRU Model” in Journal of Physics : Conference Series , IOP Publishing Ltd, Jan. 2021, doi: 10.1088/1742-6596/1746/1/012019.
[9] J. Li, P. Yao , L. Guo , and W. Zhang , “Boosted transformer for image captioning” Applied Sciences(Switzerland), vol. 9, no. 16, pp . 1–15, 2019, doi: 10.3390/app9163260.
[10] M. Sundermeyer , R. Schlüter , and H. Ney , “LSTM neural networks for language modeling” in 13th Annual Conference of the International Speech Communications Association 2012, INTERSPEECH 2012 , 2012.
[11] N. Rai, D. Kumar, N. Kaushik , C. Raj, and A. Ali, “Fake News Classification using transformer based enhanced LSTM and BERT” International Journal of Cognitive Computing in Engineering, vol. 3, pp. 98–105, Jun. 2022, doi: 10.1016/j.ijcce.2022.03.003.
[12] A. Géron, “Hands-on Machine Learning with Scikit-Learn , Keras, and TensorFlow Concepts, Tools , and Techniques to Build Intelligent Systems”, Second ed., City: United State of America, O’Reilly Media, Inc., 2019.
[13] S. Minaee , N. Kalchbrenner , E. Cambria , N. Nikzad , M. Chenaghlu , and J. Gao , “Deep Learning Based Text Classification : A Comprehensive Review” ACM Computing Surveys(CSUR), vol. 54, no. 3, pp . 1–43, 2020, doi: https://doi.org/10.1145/3439726, Available: http://arxiv.org/abs/2004.03705.
[14] C. Bircanoğlu, “A Comparison of Losses Functions in Deep embedding”. [Online]. Available : https://www.researchgate.net/publication/318588371
[15] R. He et al. , “Confusion Matrices and Rough Set Data Analysis” Journal of Physics: Conference Series, vol. 1229, 2019, doi: 10.1088/1742-6596/1229/1/012055.
[16] Z. Karimi , “Confusion Matrix”, 2021.
[17] F. Koto, A. Rahimi, JH Lau, and T. Baldwin , “IndoLEM and IndoBERT : A Benchmark Datasets and Pre-trained Language Model for Indonesian NLP” in 28th International Conference on Computational Linguistics, Nov. 2020, doi: 10.18653/v1/2020.coling-main.66, [Online]. Available : http://arxiv.org/abs/2011.00677
[18] H. Gholamalinezhad and H. Khosravi ,” Pooling Methods in Deep Neural Networks , a Review”, arXiv.org, https://doi.org/10.48550/arXiv.2009.07485. [Online] Available: https://arxiv.org/pdf/2009.07485
[19] R. Cai et al., “Sentiment analysis about investors and consumers in energy markets based on BERT-BILSTM” IEEE Access, vol. 8, pp. 171408–171415, 2020, doi: 10.1109/ACCESS.2020.3024750.
[20] SK Addagarla , “Real Time Multi-Scale Facials Mask Detection and Classification Using Deep Transfer Learning Techniques,” International Journal of Advanced Trends in Computers Science and Engineering , vol. 9, no. 4, pp . 4402–4408, Aug . 2020, doi: 10.30534/ ijatcse /2020/33942020.
Published
2025-01-31
How to Cite
INDAH, Komang Ayu Triana et al. The BERT Uncased and LSTM Multiclass Classification Model for Traffic Violation Text Classification. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, [S.l.], v. 15, n. 02, p. 112-123, jan. 2025. ISSN 2541-5832. Available at: <https://ojs.unud.ac.id/index.php/lontar/article/view/116705>. Date accessed: 08 feb. 2025. doi: https://doi.org/10.24843/LKJITI.2024.v15.i02.p04.