Eesti keele kui emakeele õppija tekstikorpus EMMA [The Estonian native-speaking students’text corpus EMMA]

Kadri Sõrmus, Kersti Lepajõe


EMMA1, the Estonian language learners’ text corpus being developed at the Institute of Estonian and General Linguistics of Tartu University, is an environment that gathers texts connected with study processes of students learning Estonian as a native language. The article gives an overview ofthe basis of compiling the EMMA corpus, its character, annotation, analysis and research opportunities.The corpus texts fall into four categories: examination and level test papers, student research papers, essays sent to writing contests, and other texts. The student text corpus will include texts from two school levels: highschool (grades 11 and 12) and middle school (grades 8 and 9).During the first phase of corpus creation in 2013–2016, the focus of the Estonian native-speaking students’ text corpus EMMA is on examination papers and level tests. Graduation essays of high school students have been collected in Estonia since 1997, when the compulsory Estonian language exam given at the end of high school started to be graded nationally. During the period 1997–2014, all high school graduates (approx. 7,000–10,00012th graders per year) wrote 400–600 word argumentative essays as anational examination. In order to build the corpus, samples from 1999, 2002,2005, 2008, 2011 and 2014 were selected, and texts were scanned, typed in,entered into the EMMA environment, and the fi rst annotation was added,i.e. mistakes marked by the nationally selected graders. By 2016, it is planned to enter at least 6,000 texts, including 3,000national examination essays (approx. 600 words per essay) and 3,000 level tests (approx. 200 words), as well as making the corpus accessible to researchers through the EMMA environment.1 htt ps:// far, there are no electronic text corpora for analysing the languageuse of Estonian native-speaking students that enable quick searches anduse of contemporary research methods. Therefore, creating a corpus ofEstonian native-language learners’ texts is an important step in providingresearchers of students’ papers and other researchers with trustworthy primary material and in creating more analysis opportunities. Hopefully,the language learners’ text corpus EMMA will fulfil its goals, contribute to researching texts written by Estonian native-language students and, through research outcomes, contribute to the quality of native language teaching and teaching materials.


language coprora, students’ language use, language learning;L1 teaching, testing and assessment, test development

Full Text:



  • There are currently no refbacks.

Published by / Kirjastaja:

ISSN 2504-6616 (print/trükis)

ISSN 2504-6624 (online/võrguväljaanne)