Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages
View/ Open
Date
2015-05-11Author
Popovic, Maja
Arcan, Mihael
Metadata
Show full item recordUsage
This item's downloads: 91 (view details)
Recommended Citation
Popovic, Maja, & Arcan, Mihael. (2015). Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages. Paper presented at the 18th Annual Conference of the European Association for Machine Translation (EAMT2015), Anatlya, Turkey, 11-13 May.
Published Version
Abstract
The best way to improve a statistical machine translation system is to identify concrete problems causing translation errors
and address them. Many of these problems are related to the characteristics of
the involved languages and differences between them. This work explores the main
obstacles for statistical machine translation systems involving two morphologically rich and under-resourced languages,
namely Serbian and Slovenian. Systems
are trained for translations from and into
English and German using parallel texts
from different domains, including both
written and spoken language. It is shown
that for all translation directions structural
properties concerning multi-noun collocations and exact phrase boundaries are the
most difficult for the systems, followed by
negation, preposition and local word order
differences. For translation into English
and German, articles and pronouns are the
most problematic, as well as disambiguation of certain frequent functional words.
For translation into Serbian and Slovenian,
cases and verb inflections are most difficult. In addition, local word order involving verbs is often incorrect and verb parts
are often missing, especially when translating from German.