PERFORMANCE ANALYSIS OF MULTILINGUAL AND MONOLINGUAL MODELS IN PREDICTING INDONESIAN LANGUAGE EMOTION USING TWITTER DATASET
Abstract
Although Indonesia has the third largest population in the world, the number of datasets available in the field of text processing in Indonesian is still very limited. Therefore, this research utilizes the ability of multilingual models that can be trained with multiple languages to predict emotions based on low-resource language such as Indonesian. Several training scenarios were conducted to evaluate the transferability and performance of these multilingual models compared to the monolingual IndoBERT model. The experimental results show that XLM-R outperforms mBERT and achieves competitive performance to IndoBERT, with XLM-R and IndoBERTÂ achieving F1-score of 0.7793 and 0.7733 respectively. XLM-R also demonstrates competitive results on other evaluation metrics. These findings suggest that XLM-RoBERTa could be a promising alternative for emotion detection in languages with limited resources, such as Indonesian.