Basic Word Extraction Algorithm Based on Morphological Rules for Balinese Texts
Abstract
Stemming is the process of extracting the root word of an affixed word. The process is intended to reduce the variations in the word. In this research, we are interested in applying stemming on Balinese language. Previous works on stemming of the Balinese language applied rule-based method but only prefix and suffix were considered. Moreover, the rules were constructed without providing much attention to the morphology of the Balinese language. Rule-based method can be verified and validated with ease on simple problem but fail to do so on problems with high complexity such as Balinese language. To overcome the weaknesses of rule-based stemming on Balinese language, we propose a method that reduce all variations of affix on Balinese language by combining the rule- based approach and the Balinese language morphology. Based on experiments carried out, our proposed method obtained an average stemming accuracy of 99% which is better than 96.67% achieved by the previous method.
Keywords: Stemming, Balinese language, Rule-based