Poor man’s lemmatisation for automatic error classification
View/ Open
Date
2015-04-11Author
Popovic, Maja
Arcan, Mihael
Avramidis, Eleftherios
Burchardt, Aljoscha
Lommel, Arle
Metadata
Show full item recordUsage
This item's downloads: 75 (view details)
Recommended Citation
Popovic, Maja, Arcan, Mihael, Avramidis, Eleftherios, Burchardt, Aljoscha, & Lommel, Arle. (2015). Poor man’s lemmatisation for automatic error classification. Paper presented at the 18th Annual Conference of the European Association for Machine Translation (EAMT2015 ), Antalya, Turkey, 11-13 May.
Published Version
Abstract
This paper demonstrates the possibility to
make an existing automatic error classifier for machine translations independent
from the requirement of lemmatisation.
This makes it usable also for smaller and
under-resourced languages and in situations where there is no lemmatiser at hand.
It is shown that cutting all words into the
first four letters is the best method even
for highly inflective languages, preserving
both the detected distribution of error types
within a translation output as well as over
various translation outputs.
The main cost of not using a lemmatiser
is the lower accuracy of detecting the inflectional error class due to its confusion
with mistranslations. For shorter words,
actual inflectional errors will be tagged as
mistranslations, for longer words the other
way round. Keeping all that in mind, it is
possible to use the error classifier without
target language lemmatisation and to extrapolate inflectional and lexical error rates
according to the average word length in the
analysed text.