J. Nelder and R. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society. Series A (General), vol.135, issue.3, pp.370-384, 1972.

P. Mccullagh, Generalized linear models, European Journal of Operational Research, vol.16, issue.3, pp.285-292, 1984.

J. R. Fienup, Phase retrieval algorithms: a comparison, Applied optics, vol.21, issue.15, 1982.

L. Demanet and P. Hand, Stable optimizationless recovery from phaseless linear measurements, Journal of Fourier Analysis and Applications, vol.20, issue.1, pp.199-221, 2014.

E. J. Candes, T. Strohmer, and V. Voroninski, Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming, Communications on Pure and Applied Mathematics, vol.66, issue.8, pp.1241-1274, 2013.

P. T. Boufounos and R. G. Baraniuk, 1-bit compressive sensing, 42nd Annual Conference on Information Sciences and Systems (CISS), pp.16-21, 2008.

P. Bühlmann, S. Van-de, and . Geer, Statistics for high-dimensional data: methods, theory and applications, 2011.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

D. L. Donoho and J. Tanner, Sparse nonnegative solution of underdetermined linear equations by linear programming, Proceedings of the National Academy of Sciences of the United States of America, vol.102, pp.9446-9451, 2005.

E. J. Candes and T. Tao, Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Transactions on Information Theory, vol.52, issue.12, pp.5406-5425, 2006.

D. L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing, Proceedings of the National Academy of Sciences, vol.106, issue.45, pp.18914-18919, 2009.

S. Rangan, Generalized approximate message passing for estimation with random linear mixing, IEEE International Symposium on Information Theory Proceedings (ISIT), pp.2168-2172, 2011.

L. Zdeborová and F. Krzakala, Statistical physics of inference: thresholds and algorithms, Advances in Physics, vol.65, issue.5, pp.453-552, 2016.

U. Kamilov, V. K. Goyal, and S. Rangan, Optimal quantization for compressive sensing under message passing reconstruction, IEEE International Symposium on Information Theory Proceedings (ISIT), pp.459-463, 2011.

Y. Xu, Y. Kabashima, and L. Zdeborová, Bayesian signal reconstruction for 1-bit compressed sensing, Journal of Statistical Mechanics: Theory and Experiment, issue.11, p.11015, 2014.

P. Schniter and S. Rangan, Compressive phase retrieval via generalized approximate message passing, IEEE Transactions on Signal Processing, vol.63, issue.4, pp.1043-1055, 2015.

M. Bayati and A. Montanari, The lasso risk for gaussian matrices, IEEE Transactions on Information Theory, vol.58, issue.4, pp.1997-2017, 2012.

N. E. Karoui, D. Bean, P. J. Bickel, C. Lim, and B. Yu, On robust regression with high-dimensional predictors, Proceedings of the National Academy of Sciences, vol.110, issue.36, pp.14557-14562, 2013.

D. Donoho and A. Montanari, High dimensional robust m-estimation: asymptotic variance via approximate message passing. Probability Theory and Related Fields, vol.166, pp.935-969, 2016.

R. Gribonval and P. Machart, Reconciling" priors" &" priors" without prejudice?, Advances in Neural Information Processing Systems, pp.2193-2201, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00918008

M. Advani and S. Ganguli, An equivalence between high dimensional bayes optimal inference and m-estimation, Advances in Neural Information Processing Systems, 2016.

E. Gardner and B. Derrida, Three un nished works on the optimal storage capacity of networks, Journal of Physics A: Mathematical and General, vol.22, issue.12, 1983.

H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Phys. Rev. A, vol.45, pp.6056-6091, 1992.

T. L. Watkin, A. Rau, and M. Biehl, The statistical mechanics of learning a rule, Rev. Mod. Phys, vol.65, pp.499-556, 1993.

A. Engel and C. Van-den-broeck, Statistical mechanics of learning, 2001.

A. Engel and L. Reimers, Reliability of replica symmetry for the generalization problem in a toy multilayer neural network, Europhys. Lett, vol.28, p.531, 1994.

G. J. Bex, R. Serneels, and C. V. Broeck, Storage capacity and generalization error for the reversedwedge ising perceptron, Phys. Rev. E, vol.51, p.6309, 1995.

T. Hosaka, Y. Kabashima, and H. Nishimori, Statistical mechanics of lossy data compression using a nonmonotonic perceptron, Physical Review E, vol.66, issue.6, p.66126, 2002.

C. Baldassi, C. Borgs, J. T. Chayes, A. Ingrosso, C. Lucibello et al., Unreasonable e ectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes, Proceedings of the National Academy of Sciences, vol.113, issue.48, pp.7655-7662, 2016.

C. H. Martin and M. W. Mahoney, Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior, 2017.

N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method, Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pp.368-377, 1999.

R. Shwartz-ziv and N. Tishby, Opening the black box of deep neural networks via information, 2017.

C. E. Shannon, A mathematical theory of communication, part i, part ii, Bell Syst. Tech. J, vol.27, pp.623-656, 1948.

T. Tanaka, A statistical-mechanics approach to large-system analysis of cdma multiuser detectors, IEEE Transactions on Information Theory, vol.48, issue.11, pp.2888-2910, 2002.

D. Guo and S. Verdú, Randomly spread cdma: Asymptotics via statistical physics, IEEE Transactions on Information Theory, vol.51, issue.6, pp.1983-2010, 2005.

A. R. Barron and A. Joseph, Toward fast reliable communication at rates near capacity with gaussian noise, IEEE International Symposium on Information Theory (ISIT), pp.315-319, 2010.

J. Barbier and F. Krzakala, Approximate message-passing decoder and capacity-achieving sparse superposition codes, IEEE Transactions on Information Theory, 2017.

C. Rush, A. Greig, and R. Venkataramanan, Capacity-achieving sparse superposition codes via approximate message passing decoding, IEEE Transactions on Information Theory, vol.63, issue.3, pp.1476-1500, 2017.

J. Barbier, M. Dia, and N. Macris, Threshold saturation of spatially coupled sparse superposition codes for all memoryless channels, IEEE Information Theory Workshop (ITW), 2016.

J. Barbier, M. Dia, and N. Macris, Universal sparse superposition codes with spatial coupling and GAMP decoding, 2017.

M. Mézard, The space of interactions in neural networks: Gardner's computation with the cavity method, Journal of Physics A: Mathematical and General, vol.22, issue.12, pp.2181-2190, 1989.

E. Bolthausen, An iterative construction of solutions of the tap equations for the sherringtonkirkpatrick model, Communications in Mathematical Physics, vol.325, issue.1, pp.333-366, 2014.

M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory, vol.57, issue.2, pp.764-785, 2011.

M. Bayati, M. Lelarge, and A. Montanari, Universality in polytope phase transitions and message passing algorithms, The Annals of Applied Probability, vol.25, issue.2, pp.753-822, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01254901

A. Javanmard and A. Montanari, State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A, Journal of the IMA, vol.2, issue.2, pp.115-144, 2013.

J. Barbier, M. Dia, N. Macris, and F. Krzakala, The mutual information in random linear estimation, 54th Annual Allerton Conference on Communication, Control, and Computing, pp.625-632, 2016.

J. Barbier, N. Macris, M. Dia, and F. Krzakala, Mutual information and optimality of approximate message-passing in random linear estimation, 2017.

G. Reeves and H. D. Ster, The replica-symmetric prediction for compressed sensing with gaussian matrices is exact, IEEE International Symposium on Information Theory (ISIT), pp.665-669, 2016.

M. Mézard, G. Parisi, and M. Virasoro, Spin glass theory and beyond, 1987.

G. Györgyi, First-order transition to perfect generalization in a neural network with binary synapses, Physical Review A, vol.41, issue.12, p.7097, 1990.

H. Sompolinsky, N. Tishby, and H. S. Seung, Learning from examples in large neural networks, Physical Review Letters, vol.65, issue.13, p.1683, 1990.

J. Barbier and N. Macris, The adaptive interpolation method: a simple scheme to prove replica formulas in bayesian inference. Probability Theory and Related Fields, 2018.

A. Coja-oghlan, F. Krzakala, W. Perkins, and L. Zdeborova, Information-theoretic thresholds from the cavity method, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp.146-157, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01448087

D. Guo, S. Shamai, and S. Verdú, Mutual information and minimum mean-square error in gaussian channels, IEEE Transactions on Information Theory, vol.51, issue.4, pp.1261-1282, 2005.

M. Opper and D. Haussler, Generalization performance of bayes optimal classi cation algorithm for learning a perceptron, Physical Review Letters, vol.66, issue.20, p.2677, 1991.

D. J. Thouless, P. W. Anderson, and R. G. Palmer, Solution of'solvable model of a spin glass, Philosophical Magazine, vol.35, issue.3, pp.593-601, 1977.

Y. Kabashima, Inference from correlated patterns: a uni ed theory for perceptron learning and linear vector channels, Journal of Physics: Conference Series, vol.95, issue.1, p.12001, 2008.

D. L. Donoho, A. Javanmard, and A. Montanari, Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing, IEEE Transactions on Information Theory, vol.59, issue.11, pp.7434-7464, 2013.

M. Opper and O. Winther, Mean eld approach to bayes learning in feed-forward neural networks, Physical review letters, vol.76, issue.11, 1964.

M. Opper and O. Winther, Tractable approximations for probabilistic models: The adaptive thoulessanderson-palmer mean eld approach, Physical Review Letters, vol.86, issue.17, p.3695, 2001.

M. Mondelli and A. Montanari, Fundamental limits of weak recovery with applications to phase retrieval, 2017.

, Github repository of numerical experiments for the current paper, 2017.

A. K. Fletcher and S. Rangan, Iterative reconstruction of rank-one matrices in noise. Information and Inference: A, Journal of the IMA, vol.7, issue.3, pp.531-562, 2018.

D. Hansel, G. Mato, and C. Meunier, Memorization without generalization in a multilayered neural network, Europhysics Letters), vol.20, issue.5, p.471, 1992.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

F. Chollet, , 2015.

Y. Wu and S. Verdú, Rényi information dimension: Fundamental limits of almost lossless analog compression, IEEE Transactions on Information Theory, vol.56, issue.8, pp.3721-3748, 2010.

F. Krzakala, M. Mézard, F. Sausset, Y. Sun, and L. Zdeborová, Statistical-physics-based reconstruction in compressed sensing, Phys. Rev. X, vol.2, issue.18, p.21005, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00716897

A. Maleki, L. Anitori, Z. Yang, and R. G. Baraniuk, Asymptotic analysis of complex lasso via complex approximate message passing (camp), IEEE Transactions on Information Theory, vol.59, issue.7, pp.4290-4308, 2013.

M. Soltanolkotabi, Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization, 2017.

F. Rosenblatt, The perceptron, a perceiving and recognizing automaton Project Para, 1957.

F. Guerra and F. L. Toninelli, The thermodynamic limit in mean eld spin glass models, Communications in Mathematical Physics, vol.230, issue.1, pp.71-79, 2002.

M. Talagrand, Mean eld models for spin glasses: Volume I: Basic examples, vol.54, 2010.

M. Mezard and A. Montanari, Information, physics, and computation, 2009.

C. Baldassi, A. Braunstein, N. Brunel, and R. Zecchina, E cient supervised learning in networks with binary synapses, Proceedings of the National Academy of Sciences, vol.104, issue.26, pp.11079-11084, 2007.

T. Richardson and R. Urbanke, Modern coding theory, 2008.

M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, vol.1, pp.1-305, 2008.

J. P. Vila and P. Schniter, Expectation-maximization gaussian-mixture approximate message passing, IEEE Transactions on Signal Processing, vol.61, issue.19, pp.4658-4672, 2013.

J. Ziniel, P. Schniter, and P. Sederberg, Binary linear classi cation and feature selection via generalized approximate message passing, Information Sciences and Systems (CISS), 2014 48th Annual Conference on, pp.1-6, 2014.

F. Krzakala, M. Mézard, F. Sausset, Y. Sun, and L. Zdeborová, Probabilistic reconstruction in compressed sensing: algorithms, phase diagrams, and threshold achieving matrices, Journal of Statistical Mechanics: Theory and Experiment, vol.2012, issue.08, p.8009, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00764645

D. Donoho and J. Tanner, Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing, Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol.367, pp.4273-4293, 1906.

S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi, Simultaneously structured models with application to sparse and low-rank matrices, IEEE Transactions on Information Theory, vol.61, issue.5, pp.2886-2908, 2015.

E. B. Baum and Y. Lyuu, The transition to perfect generalization in perceptrons, Neural computation, vol.3, issue.3, pp.386-401, 1991.

S. Diamond and S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, Journal of Machine Learning Research, 2016.

T. Goldstein, C. Studer, and . Phasemax, Convex phase retrieval via basis pursuit, 2016.

J. Barbier, N. Macris, and L. Miolane, The Layered Structure of Tensor Estimation and its Mutual Information, 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2017.

A. Montanari, Estimating random variables from random sparse observations, European Transactions on Telecommunications, vol.19, issue.4, pp.385-403, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00290792

S. B. Korada and N. Macris, Exact solution of the gauge symmetric p-spin glass model on a complete graph, Journal of Statistical Physics, vol.136, issue.2, pp.205-230, 2009.

Y. Wu and S. Verdú, Optimal phase transitions in compressed sensing, IEEE Transactions on Information Theory, vol.58, issue.10, pp.6241-6263, 2012.

N. Macris, Gri th-kelly-sherman correlation inequalities: A useful tool in the theory of error correcting codes, IEEE Transactions on Information Theory, vol.53, issue.2, pp.664-683, 2007.

S. B. Korada and N. Macris, Tight bounds on the capacity of binary input random cdma systems, IEEE Transactions on Information Theory, vol.56, issue.11, pp.5590-5613, 2010.

P. Hartman, Ordinary Di erential Equations. Society for Industrial and Applied Mathematics, 1964.

P. Milgrom and I. Segal, Envelope theorems for arbitrary choice sets, Econometrica, vol.70, issue.2, pp.583-601, 2002.

D. Guo, Y. Wu, S. S. Shitz, and S. Verdú, Estimation in gaussian noise: Properties of the minimum mean-square error, IEEE Transactions on Information Theory, vol.57, issue.4, pp.2371-2385, 2011.

S. B. Korada and A. Montanari, Applications of the Lindeberg principle in communications and statistical learning, IEEE Transactions on Information Theory, vol.57, issue.4, pp.2440-2450, 2011.

Y. Wu and S. Verdú, Functional properties of minimum mean-square error and mutual information, IEEE Transactions on Information Theory, vol.58, issue.3, pp.1289-1301, 2012.

L. Miolane, Phase transitions in spiked matrix estimation: information-theoretic analysis, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01960925

V. I. Bogachev, Measure theory, vol.1, 2007.

M. Lelarge and L. Miolane, Fundamental limits of symmetric low-rank matrix estimation, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01648368

S. Chatterjee, A generalization of the lindeberg principle. The Annals of Probability, vol.34, pp.2061-2076, 2006.

R. T. Rockafellar, Convex analysis, 2015.

S. Boucheron, G. Lugosi, and O. Bousquet, Concentration inequalities, Advanced Lectures on Machine Learning, pp.208-240, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00751496

J. Bezanzon, S. Karpinski, V. Shah, and A. Edelman, Julia: A fast dynamic language for technical computing, 2012.

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

, The proof is based on the Lindeberg generalization theorem (Theorem 2 from [95]) which is a variant of the generalized

, Applying Theorem 1 for P out , we obtain that for n large enough | 1 n I(X * ; Y|?)?inf q?

?. , We now combine this with (212) and Lemma 19 we obtain that for n large enough Each of these partial derivatives are of the form |? u g| = |n ?1 ? u?t