Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation - CEA - Commissariat à l’énergie atomique et aux énergies alternatives Access content directly
Conference Papers Year : 2018

Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation

Abstract

Domain adaptation consists in adapting Machine Translation (MT) systems designed for one domain to work in another. Multiword expressions generally characterize specific-domains vocabularies. Translating multiword expressions is a challenge for current Statistical Machine Translation (SMT) systems because corpus-based approaches are effective only when large amounts of parallel corpora are available. However, parallel corpora are only available for a limited number of language pairs and domains, and the process of building corpora for several language pairs and domains is time consuming and expensive. This paper describes an experimental evaluation of the impact of using a specialized bilingual lexicon of multiword expressions in order to obtain better domain adaptation for the state of the art statistical machine translation system Moses. Our study concerns the English-French language pair and two kinds of texts: in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents). We introduce three methods to integrate extracted bilingual multiword expressions in Moses. We experimentally show that integrating specialized bilingual lexicons of multiword expressions improve translation quality of Moses for both in-domain and out-of-domain texts.
Not file

Dates and versions

cea-01772655 , version 1 (20-04-2018)

Identifiers

Cite

Nasredine Semmar, Meriama Laib. Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation. PACLING 2017: International Conference of the Pacific Association for Computational Linguistics, Aug 2017, Yangon, Myanmar (Burma). ⟨10.1007/978-981-10-8438-6_9⟩. ⟨cea-01772655⟩
70 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More