CEA LIST's participation at the CLEF CHiC 2013

Abstract : For our first participation to the CLEF CHiC Lab, we submitted runs to the mul tilingual ad-hoc and multilingual semantic enrichment tasks. Given the strong mul tilingual character of the evaluation corpus, the main objectives of the experiments were to test the efficiency of semantic topic expansion and consolidation based on Explicit Semantic Analysis (ESA) versions in different languages. Another objective was multilingual fusion of results obtained in the different languages of the corpus. ESA was adapted for the 10 languages that are best represented in the Europeana corpus. Wikipedia dumps from March 2012 were used for French and English and from March 2013 for the other languages. One problem that arises when model ing short documents, such as queries, with classical ESA vectors no information is available whether the concept is related to the entire topic only to a part of it. To overcome this problem, two adaptations of ESA (ESA-C) are Wikipedia concepts that are linked to the highest number of concepts from the original topic. In the ad-hoc task, ESA and ESA-C have two roles: to expand the topic with related con cepts and to create consolidated topic models which contain the original topic words along with other related keywords. We submitted both monolingual and multilingual runs without topic expansion and using topic expansion and consolidation with clas sical ESA and ESA-C. The best results are obtained in a multilingual setting with no expansion and ESA-C topic consolidation. For the semantic enrichment task, we propose lists of related Wikipedia concepts using either a monolingual ranking or a voting scheme that surfaces related concepts that appear in the largest number of languages. Here, the best results are obtained in a monolingual ranking configuration that exploits ESA-C.
