U. Ahsan and I. Essa, Clustering social event images using kernel canonical correlation analysis, Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pp.814-819, 2014.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010), pp.177-187, 2010.

Y. Boureau, J. Ponce, and Y. Lecun, A theoretical analysis of feature pooling in visual recognition, ICML, 2010.

X. Chen, Y. Mu, S. Yan, and T. Chua, Efficient large-scale image annotation by probabilistic collaborative multi-label propagation, Proceedings of the 18th ACM International Conference on Multimedia, pp.35-44, 2010.

T. Chua, J. Tang, R. Hong, H. Li, Z. Luo et al., NUS-WIDE: A real-world web image database from National University of Singapore, Proc. of ACM Conference on Image and Video Retrieval (CIVR'09), 2009.

J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. Lanckriet et al., On the role of correlation and abstraction in cross-modal multimedia retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.3, pp.521-535, 2014.

F. Feng, X. Wang, and R. Li, Cross-modal retrieval with correspondence autoencoder, Proc. of ACM International Conference on Multimedia, MM '14, 2014.

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean et al., Devise: A deep visual-semantic embedding model, Advances in Neural Information Processing Systems, pp.2121-2129, 2013.

Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, A multi-view embedding space for modeling internet images, tags, and their semantics, International Journal of Computer Vision, vol.106, issue.2, pp.210-233, 2014.

D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-taylor, Canonical correlation analysis: An overview with application to learning methods, Neural Computation, vol.16, issue.12, pp.2639-2664, 2004.

M. Hodosh, P. Young, and J. Hockenmaier, Framing image description as a ranking task: Data, models and evaluation metrics, Journal of Artificial Intelligence Research, pp.853-899, 2013.

S. J. Hwang and K. Grauman, Learning the relative importance of objects from tagged images for retrieval and cross-modal search, International Journal of Computer Vision, vol.100, issue.2, pp.134-153, 2012.

S. J. Hwang and K. Grauman, Reading between the lines: Object localization using implicit cues from image tags, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.6, pp.1145-1158, 2012.

A. Joly and O. Buisson, Random maximum margin hashing, The 24th IEEE Conference on Computer Vision and Pattern Recognition, pp.873-880, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00642178

A. Karpathy, A. Joulin, and F. F. Li, Deep fragment embeddings for bidirectional image sentence mapping, Advances in neural information processing systems, pp.1889-1897, 2014.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, CoRR, 2013.

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee et al., Multimodal deep learning, International Conference on Machine Learning (ICML), pp.689-696, 2011.

D. Novak, M. Batko, and P. Zezula, Large-scale image retrieval using neural net descriptors, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.1039-1040, 2015.

F. Perronnin, J. Sánchez, and Y. Liu, Large-scale image categorization with explicit data embedding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2297-2304, 2010.

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

N. Srivastava and R. R. Salakhutdinov, Multimodal learning with deep Boltzmann machines, Advances in neural information processing systems, pp.2222-2230, 2012.

G. Wang, D. Hoiem, and D. A. Forsyth, Building text features for object image classification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1367-1374, 2009.

W. Wang, R. Arora, K. Livescu, and J. Bilmes, On deep multi-view representation learning, International Conference on Machine Learning (ICML), 2015.