Language related issues for machine translation between closely related south Slavic languages

View/ Open
Date
2016-12-12Author
Popovic, Maja
Arcan, Mihael
Klubicka, Filip
Metadata
Show full item recordUsage
This item's downloads: 109 (view details)
Recommended Citation
Popovic, Maja, Arcan, Mihael, & Klubicka, Filip. (2016). Language related issues for machine translation between closely related South Slavic languages. Paper presented at the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial-3), Osaka, Japan, 12 December.
Published Version
Abstract
Machine translation between closely related languages is less challenging and exhibits a smaller
number of translation errors than translation between distant languages, but there are still obstacles which should be addressed in order to improve such systems. This work explores the obstacles for machine translation systems between closely related South Slavic languages, namely
Croatian, Serbian and Slovenian. Statistical systems for all language pairs and translation directions are trained using parallel texts from different domains, however mainly on spoken language
i.e. subtitles. For translation between Serbian and Croatian, a rule-based system is also explored.
It is shown that for all language pairs and for both translation systems, the main obstacles are the
differences between syntactic properties.