Lemmatization in Balinese Language

  • I Gede Angga Purnajiwa Arimbawa Udayana University Student
  • Ngurah Agus Sanjaya ER

Abstract

Lemmatization is a process to extracting root word from an affixed word with the aim of reducing variations of the word into the root word. Previous researches on extraction of root word in Balinese Language has been done with rule- based methods to remove affixes from words. The weakness of the rule-based method is that it must comply with the set of rules provided. However, writings in Balinese often contain typographical errors because speakers tend to write words according to how the word is spoken instead of following the correct rules. In this research, we apply the Levenshtein distance method to overcome the aforementioned shortcoming. After all the rules applied to a given word fail, the Leven- shtein distance method is used to list all words that are ”close”. Next, we select the closest word as the root word of the given input. Based on the experiments, our proposed method achieved an accuracy of 96.01 %.

Downloads

Download data is not yet available.
Published
2020-01-25
How to Cite
PURNAJIWA ARIMBAWA, I Gede Angga; SANJAYA ER, Ngurah Agus. Lemmatization in Balinese Language. JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), [S.l.], v. 8, n. 3, p. 235-242, jan. 2020. ISSN 2654-5101. Available at: <https://ojs.unud.ac.id/index.php/JLK/article/view/51892>. Date accessed: 07 mar. 2021. doi: https://doi.org/10.24843/JLK.2020.v08.i03.p04.