Show simple item record

dc.contributor.authorArcan, Mihael
dc.contributor.authorTurchi, Marco
dc.contributor.authorTonelli, Sara
dc.contributor.authorBuitelaar, Paul
dc.date.accessioned2019-02-07T15:23:14Z
dc.date.available2019-02-07T15:23:14Z
dc.date.issued2014-10-22
dc.identifier.citationArcan, Mihael, Turchi, Marco, Tonelli, Sara, & Buitelaar, Paul. (2014). Enhancing statistical machine translation with bilingual terminology in a CAT environment. Paper presented at the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), Vancouver, Canada, 22-26 October.en_IE
dc.identifier.urihttp://hdl.handle.net/10379/14924
dc.description.abstractIn this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity. Therefore, we investigate several strategies to extract and align bilingual terminology, and to embed it into the SMT. We compare two embedding methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and the cache-based model. We tested our framework on two different domains showing improvements up to 15% BLEU score points.en_IE
dc.description.sponsorshipWe would like to thank Dr. Ahmet Aker and Marcis Pinnis for providing us with their newest ¯ software and the technical support for it. This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 and by the European Union supported project MateCat (ICT-2011.4.2- 287688).en_IE
dc.formatapplication/pdfen_IE
dc.language.isoenen_IE
dc.publisherAssociation for Machine Translation in the Americasen_IE
dc.relation.ispartofProceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014)en
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectMachine translationen_IE
dc.subjectBilingual terminologyen_IE
dc.subjectCAT environmenten_IE
dc.titleEnhancing statistical machine translation with bilingual terminology in a CAT environmenten_IE
dc.typeConference Paperen_IE
dc.date.updated2019-01-23T17:59:55Z
dc.local.publishedsourcehttps://www.amtaweb.org/AMTA2014Proceedings/AMTA2014Proceedings_ResearchTrack_final.pdfen_IE
dc.description.peer-reviewednon-peer-reviewed
dc.contributor.funderScience Foundation Irelanden_IE
dc.contributor.funderSeventh Framework Programmeen_IE
dc.internal.rssid13192033
dc.local.contactMihael Arcan. Email: mihael.arcan@insight-centre.org
dc.local.copyrightcheckedYes
dc.local.versionPUBLISHED
dcterms.projectinfo:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/en_IE
dcterms.projectinfo:eu-repo/grantAgreement/EC/FP7::SP1::ICT/287688/EU/Machine Translation Enhanced Computer Assisted Translation/MATECATen_IE
nui.item.downloads103


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland