Identification of bilingual terms from monolingual documents for statistical machine translation
View/ Open
Date
2014-08-23Author
Arcan, Mihael
Giuliano, Claudio
Turchi, Marco
Buitelaar, Paul
Metadata
Show full item recordUsage
This item's downloads: 222 (view details)
Recommended Citation
Arcan, Mihael, Giuliano, Claudio, Turchi, Marco, & Buitelaar, Paul. (2014). Identification of bilingual terms from monolingual documents for statistical machine translation. Paper presented at the 4th International Workshop on Computational Terminology (Computerm2014), co-located with COLING 2014, Dublin, Ireland, 23 August.
Published Version
Abstract
The automatic translation of domain-specific documents is often a hard task for generic Statistical Machine Translation (SMT) systems, which are not able to correctly translate the large
number of terms encountered in the text. In this paper, we address the problems of automatic
identification of bilingual terminology using Wikipedia as a lexical resource, and its integration
into an SMT system. The correct translation equivalent of the disambiguated term identified in
the monolingual text is obtained by taking advantage of the multilingual versions of Wikipedia.
This approach is compared to the bilingual terminology provided by the Terminology as a Service (TaaS) platform. The small amount of high quality domain-specific terms is passed to the
SMT system using the XML markup and the Fill-Up model methods, which produced a relative
translation improvement up to 13% BLEU score points.