Chunking Phrase to Predict Pause Break in Pontianak Malay Language
Abstract
Pause break is one of the indicators of speech to be easily understood in the Text-to-Speech System. This research aims to improve the accuracy of pause prediction in Pontianak Malay Language Sentences based on earlier research using a chunking phrase. This research is done as one of the efforts to preserve Pontianak Malay Language in order not to become extinct as a local language. Chunking method uses RegexpParser function in Natural Language Toolkit to crop sentences into phrases based on the Part of Speech type. In this research, the authors have developed a new grammar and pause break rule that is different from the earlier research to increase the accuracy of pause prediction. The data used is 500 Pontianak Malay Language sentences that have been recorded by a Pontianak Malay Language native speaker to get the pause break analysis. The pause consists of a short pause (symbolized as “/1) and a long pause (symbolized as “/2”). The tests were a test of pause break compatibility in one sentence and a test using f-measure, recall, and precision parameters. Based on the tests that have been done, the new grammar rule and pause break rule from this research have a better prediction accuracy than the earlier research with the correct predictive value of sentences increasing by 23% from the earlier rule.
Downloads
References
[2] N. dan S. H. Akhsan, Hasil Sensus Penduduk 2010: Kewarganegaraan, Suku Bangsa, Agama dan Bahasa Sehari-hari Penduduk Indonesia. Jakarta: Badan Pusat Statistik, 2010.
[3] A. Trivedi, N. Pant, P. Shah, S. Sonik, and S. Agrawal, “Speech to text and text to speech recognition systems-A review,” IOSR Journal of Computer Engineering, vol. 20, no. 2, p. 39, 2018.
[4] N. Braunschweiler and R. Maia, “Pause prediction from text for speech synthesis with user-definable pause insertion likelihood threshold,” in INTERSPEECH 2016, 2016, p. 3191.
[5] A. Wahab Syahroni, J. Santoso, and E. Setyati, “Pendekatan Rule Handmade untuk Menentukan Klausa Bahasa Indonesia,” in E-Proceedings KNS&I STIKOM Bali 2017, 2017, pp. 598–603.
[6] R. J. Prathibba and M. C. Padma, “Shallow Parser for Kannada Sentences Using Machine Learning Approach,” International Journal of Computational Linguistics Research Vol. 8 Number 4, pp. 158–170, 2017.
[7] S. Abney, “Parsing By Chunks. In Berwick, Abney, and Tenny (eds),” 1991.
[8] M. I. Kamiludin, “Prediksi Jeda Pada Ucapan Bahasa Melayu Pontianak dengan Menggunakan Metode Shallow Parsing,” Universitas Tanjungpura, 2017.
[9] P. Arulmozhi and A. G. Ramakrishnan, “Prediction of Pauses in TTS - Tamil,” in Conference: Tamil internet 2010, 2010.
[10] S. Darjdowidjojo, Psikolinguistik, Pengantar Pemahaman Bahasa Manusia. Jakarta: Yayasan Obor Indonesia, 2005.
[11] C. Brierley and E. Atwell, “Corpus-Based Evaluation of Prosodic Phrase Break Prediction Using nltk_lite;s Chunk Parser to Detect Prosodic Phrase Boundaries in the Aix-MARSEC Corpus of Spoken English,” United Kingdom, 2007.
[12] L. Jian-feng, H. Guo-ping, Z. Wan-ping, and W. Ren-hua, “Chinese Prosody Phrase Break Prediction Based on Maximum Entropy Model,” in INTERSPEECH 2004, 2004.
[13] A. Teguh Nugraha, “Prediksi Jeda Dalam Ucapan Kalimat Bahasa Indonesia Dengan Hidden Markov Model,” Universitas Tanjungpura, 2014.
[14] A. F. Wicaksono and A. Purwarianti, “HMM Based Part-of-Speech Tagger for Bahasa Indonesia,” in Conference: 4th International MALINDO (Malaysian-Indonesian Language) Workshop, 2010.
[15] P. J. Sujarwo, Sepok: Cerite Orang Kampong, yang Kampongan, di Kampong Orang. Pontianak: Pijar Publishing, 2010.
[16] E. Rahayu Setyaningsih, “Part of Speech Tagger Untuk Bahasa Indonesia Dengan Menggunakan Modifikasi Brill,” Dinamika Teknologi, vol. 9, pp. 37–42, 2017.
[17] M. Adriani and H. Riza, “Research Report on Local Language Computing: Development of Indonesia Language Resources and Translation System,” 2009.
[18] P.Sarkar and K.Sreenivasa Rao, "Data-Driven Pause Prediction for Synthesis of Storytelling Style Speech Based On Discourse Modes," In: 2015 IEEE International Conference on Electronics, Computing and Communication Technologies, 2015.
[19] Q. Truong Do, S.Sakti,G.Neubig, T.Toda and S.Nakamura, "Improving Translation of Emphasis with Pause Prediction in Speech-to-Speech Translation Systems," Japan: Nara Institute of Science and Technology, 2015.
[20] R.Manurung, "Tutorial: Pengenalan Terhadap POS Tagging dan Probalistic Parsing," Workshop Nasional INACL, 2016.
[21] R.Niu and T.Osborne, "Chunks are Components: A Dependency Grammar Approach to The Syntactic Structure of Mandarin," Lingua: Elsevier, 2019
[22] A. Ibrahim and Y.Assabie, "Amharic Sentence Parsing Using Base Phrase Chunking,", In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing, CICLing 2014.
[23] A. Subhan Yazid and A.Fatwanto, "Penentuan Kelas Kata Pada Part of Speech Tagging Kata Ambigu Bahasa Indonesia," Jurnal Informatika Sunan Kalijaga, vol.2, No.3, pp. 157-166, 2018
[24] S. Denisleam-Molomer, S.Trausan-Matu, P.Dessus, and M.Bianco," Analyzing Students Pauses During Reading and Explaining A Story," RoEduNet International Conference: Networking in Education and Research 2015, Craiova, Romania, pp.90-93, 2015
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to Jurnal Lontar Komputer as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from Jurnal Lontar Komputer. The Editorial Board of Jurnal Lontar Komputer makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.