PERBAIKAN KESALAHAN KATA MENGGUNAKAN KOMBINASI JARO-WINKLER & JACCARD SIMILARITY

Spelling Checker Correction Using Combination Jaro-Winkler & Jaccard Similarity

  • I Made Agus Tresna Universitas Mataram
  • Ramaditia Dwiyansaputra Universitas Mataram
  • Halil Akhyar Universitas Mataram
Keywords: Natural Language Proccesing, Jaro-Winkler, Jaccard Similarity

Abstract

Word error correction is challenging due to the variety of errors. This research proposes a combination of two similarity algorithms to improve accuracy. The objective is to evaluate how each algorithm responds to different types of spelling errors and to assess the effectiveness of their combined performance. Jaro-Winkler determines initial similarity by assigning more weight to word prefixes, effectively addressing errors due to transposition and character omission. This algorithm excels in scenarios where the beginning of the word is critical to identifying the correct candidate. In contrast, Jaccard similarity filters candidates based on character set similarity, which helps assess the overall composition similarity of the word but does not consider character order. The test results show that Jaro-Winkler is more dominant in providing relevant correction candidates, with higher accuracy in the 1-best (68.94%) and 5-best (90.78%) scenarios compared to Jaccard (55.25% and 78.42%). This performance difference suggests that Jaro-Winkler is more suitable for the initial screening of candidates. The combination of the two algorithms proved to be more effective in handling different types of word errors than when used separately, resulting in a more robust overall correction mechanism.

Published
2025-03-22
Section
Articles