Poor man’s lemmatisation for automatic error classification

Popovic, Maja; Arcan, Mihael; Avramidis, Eleftherios; Burchardt, Aljoscha; Lommel, Arle

View/Open

final-lemmas.pdf (147.1Kb)

Date

2015-04-11

Author

Popovic, Maja

Arcan, Mihael

Avramidis, Eleftherios

Burchardt, Aljoscha

Lommel, Arle

Metadata

Show full item record

Usage

This item's downloads: 75 (view details)

Recommended Citation

Popovic, Maja, Arcan, Mihael, Avramidis, Eleftherios, Burchardt, Aljoscha, & Lommel, Arle. (2015). Poor man’s lemmatisation for automatic error classification. Paper presented at the 18th Annual Conference of the European Association for Machine Translation (EAMT2015 ), Antalya, Turkey, 11-13 May.

Published Version

https://aclanthology.info/papers/W15-4914/w15-4914

Abstract

This paper demonstrates the possibility to make an existing automatic error classifier for machine translations independent from the requirement of lemmatisation. This makes it usable also for smaller and under-resourced languages and in situations where there is no lemmatiser at hand. It is shown that cutting all words into the first four letters is the best method even for highly inflective languages, preserving both the detected distribution of error types within a translation output as well as over various translation outputs. The main cost of not using a lemmatiser is the lower accuracy of detecting the inflectional error class due to its confusion with mistranslations. For shorter words, actual inflectional errors will be tagged as mistranslations, for longer words the other way round. Keeping all that in mind, it is possible to use the error classifier without target language lemmatisation and to extrapolate inflectional and lexical error rates according to the average word length in the analysed text.

URI

http://hdl.handle.net/10379/14902

Collections

Data Science Institute (Conference Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland