Cross-modal classification by completing unimodal representations - CEA - Commissariat à l’énergie atomique et aux énergies alternatives Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Cross-modal classification by completing unimodal representations

Résumé

We argue that cross-modal classification, where models are trained on data from one modality (e.g. text) and applied to data from another (e.g. image), is a relevant problem in multimedia retrieval. We propose a method that addresses this specific problem, related to but different from cross-modal retrieval and bimodal classification. This method relies on a common latent space where both modalities have comparable representations and on an auxiliary dataset from which we build a more complete bimodal representation of any unimodal data. Evaluations on Pascal VOC07 and NUS-WIDE show that the novel representation method significantly improves the results compared to the use of a latent space alone. The level of performance achieved makes cross-modal classification a convincing choice for real applications.
Fichier principal
Vignette du fichier
ivlm08-tranA.pdf (1.14 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

cea-01840417 , version 1 (10-01-2020)

Identifiants

Citer

Thi Quynh Nhi Tran, Hervé Le Borgne, M. Crucianu. Cross-modal classification by completing unimodal representations. iV&L-MM '16 Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, Oct 2016, Amsterdam, Netherlands. pp.17-25, ⟨10.1145/2983563.2983570⟩. ⟨cea-01840417⟩
91 Consultations
173 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More