N. Tishby, F. C. Pereira, and W. Bialek, The Information Bottleneck Method. 37th Annual Allerton Conference on Communication, Control, and Computing, 1999.

N. Tishby and N. Zaslavsky, Deep learning and the information bottleneck principle, IEEE Information Theory Workshop (ITW), 2015.

R. Shwartz-ziv and N. Tishby, Opening the Black Box of Deep Neural Networks via Information, 2017.

G. Chechik, A. Globerson, N. Tishby, and Y. Weiss, Information bottleneck for Gaussian variables, Journal of Machine Learning Research, vol.6, pp.165-188, 2005.

A. M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky et al., On the Information Bottleneck Theory of Deep Learning, International Conference on Learning Representations (ICLR), 2018.

Y. Kabashima, Inference from correlated patterns: a unified theory for perceptron learning and linear vector channels, Journal of Physics: Conference Series, vol.95, issue.1, p.12001, 2008.

A. Manoel, F. Krzakala, M. Mézard, and L. Zdeborová, Multi-layer generalized linear estimation, IEEE International Symposium on Information Theory, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01447203

A. K. Fletcher and S. Rangan, Inference in Deep Networks in High Dimensions, 2017.

G. Reeves, Additivity of Information in Multilayer Networks via Additive Gaussian Noise Transforms, 55th Annual Allerton Conference on Communication, Control, and Computing, 2017.

M. Mézard, G. Parisi, and M. Virasoro, Spin Glass Theory and Beyond, 1987.

M. Mézard and A. Montanari, Information, Physics, and Computation, 2009.

, Deep Neural Networks Entropy with Replicas, Python library

A. M. Tulino, G. Caire, S. Verdú, and S. Shamai, Support Recovery With Sparsely Sampled Free Random Matrices, IEEE Transactions on Information Theory, vol.59, issue.7, pp.4243-4271, 2013.

D. Donoho and A. Montanari, High dimensional robust M-estimation: asymptotic variance via approximate message passing. Probability Theory and Related Fields, vol.166, pp.935-969, 2016.

H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Physical Review A, vol.45, issue.8, p.6056, 1992.

A. Engel and C. Van-den-broeck, Statistical Mechanics of Learning, 2001.

M. Opper and D. Saad, Advanced mean field methods: Theory and practice, 2001.

J. Barbier, F. Krzakala, N. Macris, L. Miolane, and L. Zdeborová, Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01614258

J. Barbier, N. Macris, A. Maillard, and F. Krzakala, The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices, IEEE International Symposium on Information Theory (ISIT), 2018.

D. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing, Proceedings of the National Academy of Sciences, vol.106, issue.45, pp.18914-18919, 2009.

L. Zdeborová and F. Krzakala, Statistical physics of inference: thresholds and algorithms, Advances in Physics, vol.65, issue.5, pp.453-552, 2016.

S. Rangan, Generalized approximate message passing for estimation with random linear mixing, IEEE International Symposium on Information Theory (ISIT), 2011.

S. Rangan, P. Schniter, and A. K. Fletcher, Vector approximate message passing, IEEE International Symposium on Information Theory, 2017.

J. Barbier and N. Macris, The stochastic interpolation method: a simple scheme to prove replica formulas in Bayesian inference, 2017.

J. Barbier, N. Macris, and L. Miolane, The Layered Structure of Tensor Estimation and its Mutual Information, 55th Annual Allerton Conference on Communication, Control, and Computing, 2017.

M. Moczulski, M. Denil, J. Appleyard, and N. De-freitas, ACDC: A Structured Efficient Linear Layer, International Conference on Learning Representations (ICLR), 2016.

Z. Yang, M. Moczulski, M. Denil, N. De-freitas, A. Smola et al., Deep fried convnets, IEEE International Conference on Computer Vision (ICCV), 2015.

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Storing infinite numbers of patterns in a spin-glass model of neural networks, Physical Review Letters, vol.55, issue.14, p.1530, 1985.

E. Gardner and B. Derrida, Three unfinished works on the optimal storage capacity of networks, Journal of Physics A, vol.22, issue.12, 1983.

M. Mézard, The space of interactions in neural networks: Gardner's computation with the cavity method, Journal of Physics A, vol.22, issue.12, p.2181, 1989.

C. Louart and R. Couillet, Harnessing neural networks: A random matrix approach, IEEE International Conference on Acoustics, Speech and Signal Processing, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01957760

J. Pennington and P. Worah, Nonlinear random matrix theory for deep learning, Advances in Neural Information Processing Systems (NIPS), 2017.

M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. Sohl-dickstein, On the Expressive Power of Deep Neural Networks, International Conference on Machine Learning (ICML), 2017.

A. Saxe, J. Mcclelland, and S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, International Conference on Learning Representations (ICLR), 2014.

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-dickstein, Deep information propagation, International Conference on Learning Representations (ICLR, 2017.

M. Advani and A. Saxe, High-dimensional dynamics of generalization error in neural networks, 2017.

C. Baldassi, A. Braunstein, N. Brunel, and R. Zecchina, Efficient supervised learning in networks with binary synapses, Proceedings of the National Academy of Sciences, vol.104, pp.11079-11084, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00174082

Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli et al., Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Advances in Neural Information Processing Systems, 2014.

R. Giryes, G. Sapiro, and A. M. Bronstein, Deep neural networks with random Gaussian weights: a universal classification strategy? IEEE Transactions on Signal Processing, vol.64, pp.3444-3457, 2016.

M. Chalk, O. Marre, and G. Tkacik, Relevant sparse codes with variational information bottleneck, Advances in Neural Information Processing Systems, 2016.

A. Achille and S. Soatto, Information Dropout: Learning Optimal Representations Through Noisy Computation, IEEE Transactions on Pattern Analysis and Machine Inteligence, 2018.

A. Alemi, I. Fischer, J. Dillon, and K. Murphy, Deep variational information bottleneck, International Conference on Learning Representations (ICLR, 2017.

A. Achille and S. Soatto, Emergence of Invariance and Disentangling in Deep Representations, ICML 2017 Workshop on Principled Approaches to Deep Learning, 2017.

A. Kolchinsky, B. D. Tracey, and D. H. Wolpert, Nonlinear Information Bottleneck, 2017.

M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio et al., MINE: Mutual Information Neural Estimation, 2017.

S. Zhao, J. Song, and S. Ermon, InfoVAE: Information Maximizing Variational Autoencoders, 2017.

A. Kolchinsky and B. D. Tracey, Estimating mixture entropy with pairwise distances, Entropy, vol.19, issue.7, p.361, 2017.

A. Kraskov, H. Stögbauer, and P. Grassberger, Estimating mutual information, Physical Review E, vol.69, issue.6, p.66138, 2004.

. , Learning with Synthetic Data

D. Sherrington and S. Kirkpatrick, Solvable Model of a Spin-Glass, Physical Review Letters, vol.35, issue.26, p.1792, 1975.

M. Mézard, G. Parisi, and M. Virasoro, Spin Glass Theory and Beyond, 1987.

H. Nishimori, Statistical Physics of Spin Glasses and Information Processing: An Introduction, 2001.

M. Mézard and A. Montanari, Information, Physics, and Computation, 2009.

E. Gardner, The space of interactions in neural network models, Journal of Physics A, vol.21, issue.1, p.257, 1988.

E. Gardner and B. Derrida, Optimal storage properties of neural network models, Journal of Physics A, vol.21, issue.1, p.271, 1988.

M. Mézard, The space of interactions in neural networks: Gardner's computation with the cavity method, Journal of Physics A, vol.22, issue.12, p.2181, 1989.

H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Physical Review A, vol.45, issue.8, p.6056, 1992.

A. Engel and C. Van-den-broeck, Statistical Mechanics of Learning, 2001.

T. Tanaka, A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors, IEEE Transactions on Information Theory, vol.48, issue.11, pp.2888-2910, 2002.

S. Rangan, V. Goyal, and A. K. Fletcher, Asymptotic analysis of MAP estimation via the replica method and compressed sensing, Advances in Neural Information Processing Systems (NIPS), 2009.

Y. Kabashima, T. Wadayama, and T. Tanaka, A typical reconstruction limit for compressed sensing based on Lp-norm minimization, Journal of Statistical Mechanics: Theory and Experiment, issue.09, p.9003, 2009.

S. Ganguli and H. Sompolinsky, Statistical Mechanics of Compressed Sensing, Physical Review Letters, vol.104, issue.18, p.188701, 2010.

F. Krzakala, M. Mézard, F. Sausset, Y. F. Sun, and L. Zdeborová, Statistical-physics-based reconstruction in compressed sensing, Physical Review X, vol.2, issue.2, p.21005, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00716897

L. Zdeborová and F. Krzakala, Statistical physics of inference: thresholds and algorithms, Advances in Physics, vol.65, issue.5, pp.453-552, 2016.

E. Marinari, G. Parisi, and F. Ritort, Replica field theory for deterministic models. II. A non-random spin glass with glassy behaviour, Journal of Physics A, vol.27, issue.23, p.7647, 1994.

G. Parisi and M. Potters, Mean-field equations for spin models with orthogonal interaction matrices, Journal of Physics A, vol.28, issue.18, p.5267, 1995.

M. Opper and O. Winther, Tractable Approximations for Probabilistic Models: The Adaptive Thouless-Anderson-Palmer Mean Field Approach, Physical Review Letters, vol.86, issue.17, p.3695, 2001.

R. Cherrier, D. S. Dean, and A. Lefèvre, Role of the interaction matrix in mean-field spin glass models, Physical Review E, vol.67, issue.4, p.46112, 2003.

K. Takeda, S. Uda, and Y. Kabashima, Analysis of CDMA systems that are characterized by eigenvalue spectrum, Europhysics Letters, vol.76, issue.6, p.1193, 2006.

R. R. Müller, D. Guo, and A. L. Moustakas, Vector Precoding for Wireless MIMO Systems and its Replica Analysis, IEEE Journal on Selected Areas in Communications, vol.26, issue.3, pp.530-540, 2008.

Y. Kabashima, Inference from correlated patterns: a unified theory for perceptron learning and linear vector channels, Journal of Physics: Conference Series, vol.95, issue.1, p.12001, 2008.

T. Shinzato and Y. Kabashima, Perceptron capacity revisited: classification ability for correlated patterns, Journal of Physics A, vol.41, issue.32, p.324013, 2008.

T. Shinzato and Y. Kabashima, Learning from correlated patterns by simple perceptrons, Journal of Physics A, vol.42, issue.1, p.15005, 2009.

A. M. Tulino, G. Caire, S. Verdú, and S. Shamai, Support Recovery With Sparsely Sampled Free Random Matrices, IEEE Transactions on Information Theory, vol.59, issue.7, pp.4243-4271, 2013.

Y. Kabashima and M. Vehkaperä, Signal recovery using expectation consistent approximation for linear observations, IEEE International Symposium on Information Theory (ISIT), 2014.

A. Manoel, F. Krzakala, M. Mézard, and L. Zdeborová, Multi-layer generalized linear estimation, IEEE International Symposium on Information Theory, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01447203

A. K. Fletcher and S. Rangan, Inference in Deep Networks in High Dimensions, 2017.

G. Reeves, Additivity of Information in Multilayer Networks via Additive Gaussian Noise Transforms, 55th Annual Allerton Conference on Communication, Control, and Computing, 2017.

M. Talagrand, Spin Glasses: A Challenge for Mathematicians: Cavity and Mean Field Models, 2003.

D. Panchenko, The Sherrington-Kirkpatrick model, 2013.

J. Barbier, M. Dia, N. Macris, F. Krzakala, T. Lesieur et al., Mutual Information for Symmetric Rank-one Matrix Estimation: A Proof of the Replica Formula, Advances in Neural Information Processing Systems (NIPS), 2016.
URL : https://hal.archives-ouvertes.fr/cea-01568705

M. Lelarge and L. Miolane, Fundamental limits of symmetric low-rank matrix estimation, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01648368

G. Reeves and H. D. Pfister, The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact, IEEE International Symposium on Information Theory (ISIT), 2016.

J. Barbier, F. Krzakala, N. Macris, L. Miolane, and L. Zdeborová, Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01614258

A. M. Tulino and S. Verdú, Random Matrix Theory and Wireless Communications, 2004.

J. Barbier, N. Macris, A. Maillard, and F. Krzakala, The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices, IEEE International Symposium on Information Theory (ISIT), 2018.

J. Barbier and N. Macris, The stochastic interpolation method: a simple scheme to prove replica formulas in Bayesian inference, 2017.

R. Shwartz-ziv and N. Tishby, Opening the Black Box of Deep Neural Networks via Information, 2017.

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821