Multilingual multimodal machine translation for Dravidian languages utilizing phonetic transcription

Chakravarthi, Bharathi Raja; Priyadharshini, Ruba; Stearns, Bernardo; Jayapal, Arun; Sridevy, S.; Arcan, Mihael; Zarrouk, Manel; McCrae, John P.

View/Open

LoresMT.pdf (666.4Kb)

Date

2019-08-19

Author

Chakravarthi, Bharathi Raja

Priyadharshini, Ruba

Stearns, Bernardo

Jayapal, Arun

Sridevy, S.

Arcan, Mihael

Zarrouk, Manel

McCrae, John P.

Metadata

Show full item record

Usage

This item's downloads: 287 (view details)

Recommended Citation

Chakravarthi, Bharathi Raja, Priyadharshini, Ruba, Stearns, Bernardo, Jayapal, Arun, Sridevy, S., Arcan, Mihael, Zarrouk, Manel, McCrae, John P. (2019). Multilingual multimodal machine translation for Dravidian languages utilizing phonetic transcription. Paper presented at the LoResMT 2019 : 2nd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2019 at MT Summit XVII), Dublin, Ireland, 19-23 August.

Published Version

https://www.mtsummit2019.com/workshops

Abstract

Multimodal machine translation is the task of translating from a source text into the target language using information from other modalities. Existing multimodal datasets have been restricted to only highly resourced languages. In addition to that, these datasets were collected by manual translation of English descriptions from the Flickr30K dataset. In this work, we introduce MMDravi, a Multilingual Multimodal dataset for under-resourced Dravidian languages. It comprises of 30,000 sentences which were created utilizing several machine translation outputs. Using data from MMDravi and a phonetic transcription of the corpus, we build an Multilingual Multimodal Neural Machine Translation system (MMNMT) for closely related Dravidian languages to take advantage of multilingual corpus and other modalities. We evaluate our translations generated by the proposed approach with human-annotated evaluation dataset in terms of BLEU, METEOR, and TER metrics. Relying on multilingual corpora, phonetic transcription, and image features, our approach improves the translation quality for the underresourced languages.

URI

http://hdl.handle.net/10379/15415

Collections

Data Science Institute (Workshop Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland