AMECON: Abstract meta-concept features for text-illustration

Abstract : Cross-media retrieval is a problem of high interest that is at the frontier between computer vision and natural language processing. The state-of-the-art in the domain consists of learning a common space with regard to some constraints of correlation or similarity from two textual and visual modalities that are processed in parallel and possibly jointly. This paper proposes a different approach that considers the cross-modal problem as a supervised mapping of visual modalities to textual ones. Each modality is thus seen as a particular projection of an abstract meta-concept, each of its dimension subsuming several semantic concepts ("meta" aspect) but may not correspond to an actual one ("abstract" aspect). In practice, the textual modality is used to generate a multi-label representation, further used to map the visual modality through a simple shallow neural network. While being quite easy to implement, the experiments show that our approach significantly outperforms the state-of-the-art on Flickr-8K and Flickr-30K datasets for the text-illustration task. The source code is available at http://perso.ecp.fr/~tamaazouy/.
Document type :
Conference papers
Complete list of metadatas

https://hal-cea.archives-ouvertes.fr/cea-01813718
Contributor : Léna Le Roy <>
Submitted on : Tuesday, June 12, 2018 - 3:30:08 PM
Last modification on : Wednesday, January 23, 2019 - 2:39:24 PM

Identifiers

Collections

Citation

I. Chami, Y. Tamaazousti, H. Le Borgne. AMECON: Abstract meta-concept features for text-illustration. ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval, Jun 2017, Bucharest, Romania. pp.347-355, ⟨10.1145/3078971.3078993⟩. ⟨cea-01813718⟩

Share

Metrics

Record views

77