SPELLING ERROR CORRECTION IN INDONESIAN USING DAMERAU-LEVENSHTEIN DISTANCE DAN N-GRAM
PERBAIKAN EJAAN KATA DALAM BAHASA INDONESIA MENGGUNAKAN ALGORITMA DAMERAU-LEVENSHTEIN DISTANCE DAN N-GRAM
Abstract
Writing errors or spelling is a thing that needs to be considered because it can affect the calculations performed by some of the topics on Natural Language Processing that relies on the validity of the input data. Several studies have been conducted to correct writing errors that occur, one of which study by Fahma, A. I., et al using n-gram method and Levenshtein distance produced corrections with the best precision value of 0.97 for insertion type and best recall value by 1 for substitution types. With high accuracy, this study proposes to use the algorithm of development of Levenshtein, namely Damerau-Levenshtein, and n-gram methods. Damerau-Levenshtein has the same operations like insertion, deletion, substitution but with the addition of transposition operations between two characters. Damerau not only distinguishes four edit operations but also states that operations in the developed algorithms, can fit about 80% of all human writing errors. The types of n-grams used are bigram (n = 2) and trigram (n = 3). The testing results obtained in this study for the detection accuracy of the precision and recall ranged from 80%-100%. While correction accuracy testing uses equations proposed by Dahlmier and Ng, among the average accuracy values of precision and recall for all three scenarios, scenario C with a top 10 rating has the highest accuracy value of 96%.