Eesti keele sõnajärje vealeidja prototüübi arendamine [The development of the prototype for an automatic word order error detector for the Estonian language]

Erika Matsak, Pille Eslon, Jaagup Kippar

Abstract


The article presents the possibilities for recognizing word order errors in Estonian, the methods used and the current results. The article concentrates on the prototype for an automatic word order error detector for Estonian developed in Tallinn University. The statistic‐based program works on a method that is similar to n‐grams and the rules used are the patterns formed with 9 compulsory parts of a sentence. The set of correct word order patterns were found from the fiction sub‐corpus of Tartu University’s Corpus of Written Estonian. For the statistically reliable results and the utmost efficiency and speed of the program, the rules were placed in a tree structure. The prototype starts the searches by finding a proper initial tag and continues to find a correct compatible pattern that has the highest frequency rate. At current stage the work is focused on detecting the right/wrong position of the finite/infinite verb and the predicative (since most commonly Estonian is known as a verb second language). Prototype’s efficiency was tested on Estonian learner language corpus texts. In the test described in the article 5880 sentences were analyzed with the error analyzer and 300 sentences of the output were assessed. The prototype estimated the correctitude of the word order properly in 87.82% of the cases. Although there are a number of problems that still need to be solved including the misspelled or unknown words (i.e. proper nouns) and erringly unmarked clausal border, the method and the algorithm of the prototype for an automatic word order error detector for Estonian could also be used on other languages’ word order studies as well. The article is summarized with the survey of the problems occurred on word order detection and the possible ways to make the detector more efficient.

Keywords

morphosyntax, automatic error detection, word order errors

 


Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Published by / Kirjastaja:

ISSN 2504-6616 (print/trükis)

ISSN 2504-6624 (online/võrguväljaanne)