Komavigade tuvastaja [Grammar checker for detecting comma mistakes]

Krista Liin

Abstract


The aim of this grammar checker was to detect comma mistakes in written Estonian. As of yet, the checker does not suggest corrections, but that function could be added to the existing system later. The grammar checker rules are based on the Constraint Grammar Formalism framework. The corpus used for rule development and testing consists of grammatically incorrect sentences gathered from the user postings on an Internet site. More than 9000 words were first morphologically and syntactically analyzed, and then manually tagged for comma error detection. Finite verb forms, interrogative words and conjugations were tagged and marked as correct or incorrect, depending on whether there was a comma mistake before those words. The 98 constraint rules were tested on a 150-sentence test corpus of both incorrect and correct sentences. A precision of 95% and recall of 93% was achieved on tagged words. There were no sentence-level false alarms. The problems in detecting mistakes were mainly caused by incorrect spelling, previous tagging, or situations where the usage of comma depends on semantic information. The results achieved are comparable to other grammar checkers. In the future, the grammar checker for Estonian will be further developed using larger corpora and targeting also other error types, such as agreement mistakes. The aim is to test it also on the texts written by language learners and on other text types.



Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Published by / Kirjastaja:

ISSN 2504-6616 (print/trükis)

ISSN 2504-6624 (online/võrguväljaanne)