Clustering social event images using kernel canonical correlation analysis, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW '14, pp.814-819, 2014. ,
The devil is in the details: an evaluation of recent feature encoding methods, British Machine Vision Conference, 2011. ,
, Return of the devil in the details: Delving deep into convolutional nets, 2014.
Mind's eye: A recurrent visual representation for image caption generation, CVPR, 2015. ,
On the role of correlation and abstraction in cross-modal multimedia retrieval, vol.36, pp.521-535, 2014. ,
Subcategory-aware object classification, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp.827-834, 2013. ,
Cross-modal retrieval with correspondence autoencoder, Proc. of ACM Intl. Conf. on Multimedia, MM '14, 2014. ,
Devise: A deep visual-semantic embedding model, NIPS, pp.2121-2129, 2013. ,
A multi-view embedding space for modeling internet images, tags, and their semantics, IJCV, vol.106, issue.2, pp.210-233, 2014. ,
Canonical correlation analysis: An overview with application to learning methods, Neural Comput, vol.16, issue.12, pp.2639-2664, 2004. ,
Spatial pyramid pooling in deep convolutional networks for visual recognition, TPAMI, vol.37, issue.9, pp.1904-1916, 2015. ,
Framing image description as a ranking task: Data, models and evaluation metrics, Journal of Artificial Intelligence Research, pp.853-899, 2013. ,
Feature coding in image classification: A comprehensive study, TPAMI, vol.36, issue.3, pp.493-506, 2014. ,
Learning the relative importance of objects from tagged images for retrieval and crossmodal search, IJCV, vol.100, issue.2, pp.134-153, 2012. ,
Reading between the lines: Object localization using implicit cues from image tags, TPAMI, vol.34, issue.6, pp.1145-1158, 2012. ,
Aggregating local image descriptors into compact codes, TPAMI, vol.34, issue.9, pp.1704-1716, 2012. ,
URL : https://hal.archives-ouvertes.fr/inria-00633013
Deep visual-semantic alignments for generating image descriptions, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
Deep fragment embeddings for bidirectional image sentence mapping, Advances in neural information processing systems, pp.1889-1897, 2014. ,
Distributed representations of words and phrases and their compositionality, CoRR, 2013. ,
Multimodal deep learning, Proceedings of the 28th international conference on machine learning (ICML-11), pp.689-696, 2011. ,
Fisher vectors meet neural networks: A hybrid classification architecture, CVPR, 2015. ,
Collecting image annotations using amazon's mechanical turk, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, pp.139-147, 2010. ,
Image classification with the fisher vector: Theory and practice, vol.IJCV, pp.222-245, 2013. ,
Very deep convolutional networks for large-scale image recognition, 2014. ,
Grounded compositional semantics for finding and describing images with sentences, Transactions of the Association for Computational Linguistics, vol.2, pp.207-218, 2014. ,
Multimodal learning with deep boltzmann machines, Advances in neural information processing systems, pp.2222-2230, 2012. ,
On deep multi-view representation learning, International Conference on Machine Learning, 2015. ,
CNN: single-label to multi-label. CoRR, abs/1406, vol.5726, 2014. ,
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions, TACL, vol.2, pp.67-78, 2014. ,
Bag-of-multimedia-words for image classification, ICPR, pp.1509-1512, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00825187