Annotation Error Detection and Correction for Indonesian POS Tagging Corpus
DOI:
https://doi.org/10.24843/LKJITI.2025.v16.i01.p04Keywords:
Annotation Error Detection, Annotation Error Correction, POS TaggingAbstract
Linguistic Corpus is the primary material for training and evaluating machine learning models, especially for POS Tagging. However, the human-annotated corpus is not free from annotation errors. Annotation errors have a negative impact on model performance. Therefore, we propose annotation error detection and correction. We detect annotation errors in the Indonesian POS Tagging corpus using the n-gram variation method. Then, we correct the corpus using an expert-voting approach. Annotation error detection successfully collected 6,536 annotation error candidates. Each candidate has two possibilities: (i) an ambiguous word or (ii) an incorrect annotation. Annotation error correction validated and corrected the candidates using the majority-voting method in an expert group. Annotation error correction successfully identified and corrected 503 words from 1918 sentences. Then, we compared the performance of the POS Tagging model with the corpus before and after correction. The results showed a significant improvement in the F1-score value (+9.69%) compared to the uncorrected corpus.Downloads
Published
2025-10-12
How to Cite
[1]
M. Alfian, U. L. Yuhana, D. Siahaan, and H. Munazharoh, “Annotation Error Detection and Correction for Indonesian POS Tagging Corpus”, LKJITI, vol. 16, no. 01, pp. 41–52, Oct. 2025.
Issue
Section
Articles
License
Copyright (c) 2025 Lontar Komputer : Jurnal Ilmiah Teknologi Informasi

This work is licensed under a Creative Commons Attribution 4.0 International License.
