Kombinasi Metode MFCC dan KNN dalam Pengenalan Emosi Manusia Melalui Ucapan
Abstract
Emotions are expressions that humans have in responding or responding to things that happen to themselves or in the environment around them, with these emotions humans are more able to express what they feel about a situation or event, these emotions can also be a means of communication other than language, because with emotions Of course, humans can know what happens to other humans around them. One of the human expressions to be able to communicate is by voice, sound can also be used to find out the type of emotion that is being experienced by the speaker. Mel-Frequency Cepstrum Coefficient (MFCC) is one of the feature extractions that is often used in the field of speech technology where in this feature extraction the human voice recording will be converted into a convolution matrix, namely a spectrogram or voice signal. K-Nearest Neighbor (K-NN) is a method that works by grouping new data based on the distance (neighborhood) from one data to the other. In the study of classical human emotions with speech using the K-Nearest Neighbor (K-NN) method, it is not appropriate to use this method because it only gets 50% accuracy.
References
Siti Helmiyah, Imam Riadi, Rusydi Umar, Abdullah Hanif, Anton Yudhana, Abdul Fadlil, “IDENTIFIKASI EMOSI MANUSIA BERDASARKAN UCAPAN MENGGUNAKAN METODE EKSTRAKSI CIRI LPC DAN METODE EUCLIDEAN DISTANCE ” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), Vol. 7, No. 6. Page 1177-1186, 2020.
Yulistia Aini, Tri Budi Santoso, Titon Dutono. “ Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia “ Jurnal Politeknik Caltex Riau, Vol.7, No.1. Page 143
– 152, 2021.
GUMELAR, A. B. et al. “Human Voice Emotion Identification Using Prosodic and Spectral Feature Extraction Based on Deep Neural Networks”, International Conference on Serious Games and Applications for Health (SeGAH). IEEE, Page 1– 8. 2019
Anjani Reddy J, Dr. Shiva G. “Emotion Recognition from Speech Using MLP AND KNN ”, RESEARCH ARTICLE. Vol. 11, (Series-II) Page 34-38. 2021
Steven R. Livingstone, Frank A. Russo, “The Ryerson Audio-Visual Database of
Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English”, PLoS ONE 13(5): e0196391. 2018.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya) as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya). The Editorial Board of JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya) makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.