Building specialized bilingual lexicons using large-scale background knowledge

Abstract : Bilingual lexicons are central components of machine translation and cross-lingual information retrieval systems. Their manual construction requires strong expertise in both languages involved and is a costly process. Several automatic methods were proposed as an alternative but they often rely on resources available in a limited number of languages and their performances are still far behind the quality of manual translations. We introduce a novel approach to the creation of specific domain bilingual lexicon that relies on Wikipedia. This massively multilingual encyclopedia makes it possible to create lexicons for a large number of language pairs. Wikipedia is used to extract domains in each language, to link domains between languages and to create generic translation dictionaries. The approach is tested on four specialized domains and is compared to three state of the art approaches using two language pairs: French-English and Romanian-English. The newly introduced method compares favorably to existing methods in all configurations tested.
Document type :
Conference papers
Complete list of metadatas

https://hal-cea.archives-ouvertes.fr/cea-01844695
Contributor : Léna Le Roy <>
Submitted on : Thursday, July 19, 2018 - 3:57:28 PM
Last modification on : Saturday, May 4, 2019 - 1:21:27 AM

Identifiers

  • HAL Id : cea-01844695, version 1

Collections

Citation

D. Bouamor, A. Popescu, N. Semmar, P. Zweigenbaum. Building specialized bilingual lexicons using large-scale background knowledge. 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, Oct 2013, Seattle, United States. pp.479-489. ⟨cea-01844695⟩

Share

Metrics

Record views

38