Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach - CEA - Commissariat à l’énergie atomique et aux énergies alternatives Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach

Résumé

Automatic identification of Arabic dialects in a text is a difficult task, especially for Maghreb languages and when they are written in Arabic or Latin characters (Arabizi). These texts are characterized by the use of code-switching between the Modern Standard Arabic (MSA) and the Arabic Dialect (AD) in the texts written in Arabic, or between Arabizi and foreign languages for those written in Latin. This paper presents the specific resources and tools we have developed for this purpose, with a focus on the transliteration of Arabizi into Arabic (using the dedicated tools for Arabic dialects). A dictionary-based approach to detect the dialectal origin of a text is described, it exhibits satisfactory results.
Fichier principal
Vignette du fichier
580.pdf (217.52 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02012150 , version 1 (08-02-2019)

Identifiants

  • HAL Id : hal-02012150 , version 1

Citer

Houda Saadane, Hosni Seffih, Christian Fluhr, Khalid Choukri, Nasredine Semmar. Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. ⟨hal-02012150⟩
151 Consultations
123 Téléchargements

Partager

Gmail Facebook X LinkedIn More