Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages

Popovic, Maja; Arcan, Mihael

View/Open

final-smt.pdf (128.2Kb)

Date

2015-05-11

Author

Popovic, Maja

Arcan, Mihael

Metadata

Show full item record

Usage

This item's downloads: 91 (view details)

Recommended Citation

Popovic, Maja, & Arcan, Mihael. (2015). Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages. Paper presented at the 18th Annual Conference of the European Association for Machine Translation (EAMT2015), Anatlya, Turkey, 11-13 May.

Published Version

https://aclanthology.info/papers/W15-4913/w15-4913

Abstract

The best way to improve a statistical machine translation system is to identify concrete problems causing translation errors and address them. Many of these problems are related to the characteristics of the involved languages and differences between them. This work explores the main obstacles for statistical machine translation systems involving two morphologically rich and under-resourced languages, namely Serbian and Slovenian. Systems are trained for translations from and into English and German using parallel texts from different domains, including both written and spoken language. It is shown that for all translation directions structural properties concerning multi-noun collocations and exact phrase boundaries are the most difficult for the systems, followed by negation, preposition and local word order differences. For translation into English and German, articles and pronouns are the most problematic, as well as disambiguation of certain frequent functional words. For translation into Serbian and Slovenian, cases and verb inflections are most difficult. In addition, local word order involving verbs is often incorrect and verb parts are often missing, especially when translating from German.

URI

http://hdl.handle.net/10379/14907

Collections

Data Science Institute (Conference Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland