Aggregating image and text quantized correlated components

Cross-modal tasks occur naturally for multimedia content that can be described along two or more modalities like visual content and text. Such tasks require to "translate" information from one modality to another. Methods like kernelized canonical correlation analysis (KCCA) attempt to solve such tasks by finding aligned subspaces in the description spaces of different modalities. Since they favor correlations against modality-specific information, these methods have shown some success in both cross-modal and bi-modal tasks. However, we show that a direct use of the subspace alignment obtained by KCCA only leads to coarse translation abilities. To address this problem, we first put forward a new representation method that aggregates information provided by the projections of both modalities on their aligned subspaces. We further suggest a method relying on neighborhoods in these subspaces to complete uni-modal information. Our proposal exhibits state-of-the-art results for bi-modal classification on Pascal VOC07 and improves it by over 60% for cross-modal retrieval on FlickR 8K/30K.

Mots clés

Correlated components Specific information State of the art Representation method Pattern recognition Multimedia contents Computer vision Canonical correlation analysis Visual content Subspace alignment

Domaines

Informatique [cs]

Fichier principal

Tran_Aggregating_Image_and_CVPR_2016_paper.pdf (246.73 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Léna Le Roy : Connectez-vous pour contacter le contributeur

https://cea.hal.science/cea-01843176

Soumis le : vendredi 10 janvier 2020-16:25:38

Dernière modification le : mercredi 3 avril 2024-11:14:12

Dates et versions

cea-01843176 , version 1 (10-01-2020)

Licence

Paternité

Identifiants

HAL Id : cea-01843176 , version 1
DOI : 10.1109/CVPR.2016.225

Citer

Thi Quynh Nhi Tran, Hervé Le Borgne, M. Crucianu. Aggregating image and text quantized correlated components. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 2016, Las Vegas, United States. pp.2046-2054, ⟨10.1109/CVPR.2016.225⟩. ⟨cea-01843176⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA CNAM DRT CEA-UPSAY UNIV-PARIS-SACLAY LIST CEDRIC-CNAM GS-ENGINEERING GS-COMPUTER-SCIENCE GS-SPORT-HUMAN-MOVEMENT HESAM

78 Consultations

115 Téléchargements