dc.contributor.author | Arcan, Mihael | |
dc.contributor.author | Turchi, Marco | |
dc.contributor.author | Tonelli, Sara | |
dc.contributor.author | Buitelaar, Paul | |
dc.date.accessioned | 2019-02-07T15:23:14Z | |
dc.date.available | 2019-02-07T15:23:14Z | |
dc.date.issued | 2014-10-22 | |
dc.identifier.citation | Arcan, Mihael, Turchi, Marco, Tonelli, Sara, & Buitelaar, Paul. (2014). Enhancing statistical machine translation with bilingual terminology in a CAT environment. Paper presented at the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), Vancouver, Canada, 22-26 October. | en_IE |
dc.identifier.uri | http://hdl.handle.net/10379/14924 | |
dc.description.abstract | In this paper, we address the problem of extracting and integrating bilingual terminology into
a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool
scenario. We develop a framework that, taking as input a small amount of parallel in-domain
data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance
the translation productivity. Therefore, we investigate several strategies to extract and align
bilingual terminology, and to embed it into the SMT. We compare two embedding methods
that can be easily used at run-time without altering the normal activity of an SMT system:
XML markup and the cache-based model. We tested our framework on two different domains
showing improvements up to 15% BLEU score points. | en_IE |
dc.description.sponsorship | We would like to thank Dr. Ahmet Aker and Marcis Pinnis for providing us with their newest ¯
software and the technical support for it. This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 and by the European Union supported project MateCat (ICT-2011.4.2-
287688). | en_IE |
dc.format | application/pdf | en_IE |
dc.language.iso | en | en_IE |
dc.publisher | Association for Machine Translation in the Americas | en_IE |
dc.relation.ispartof | Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014) | en |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Ireland | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/3.0/ie/ | |
dc.subject | Machine translation | en_IE |
dc.subject | Bilingual terminology | en_IE |
dc.subject | CAT environment | en_IE |
dc.title | Enhancing statistical machine translation with bilingual terminology in a CAT environment | en_IE |
dc.type | Conference Paper | en_IE |
dc.date.updated | 2019-01-23T17:59:55Z | |
dc.local.publishedsource | https://www.amtaweb.org/AMTA2014Proceedings/AMTA2014Proceedings_ResearchTrack_final.pdf | en_IE |
dc.description.peer-reviewed | non-peer-reviewed | |
dc.contributor.funder | Science Foundation Ireland | en_IE |
dc.contributor.funder | Seventh Framework Programme | en_IE |
dc.internal.rssid | 13192033 | |
dc.local.contact | Mihael Arcan. Email: mihael.arcan@insight-centre.org | |
dc.local.copyrightchecked | Yes | |
dc.local.version | PUBLISHED | |
dcterms.project | info:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/ | en_IE |
dcterms.project | info:eu-repo/grantAgreement/EC/FP7::SP1::ICT/287688/EU/Machine Translation Enhanced Computer Assisted Translation/MATECAT | en_IE |
nui.item.downloads | 103 | |