C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

S. Behnam-neyshabur, D. Bhojanapalli, N. Mcallester, and . Srebro, Exploring generalization in deep learning, Advances in Neural Information Processing Systems, pp.5947-5956, 2017.

V. N. Vapnik, The Nature of Statistical Learning Theory, 1998.

L. Peter, S. Bartlett, and . Mendelson, Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2002.

S. Shalev, -. Shwartz, and S. Ben-david, Understanding machine learning: From theory to algorithms, 2014.

H. Hyunjune-sebastian-seung, N. Sompolinsky, and . Tishby, Statistical mechanics of learning from examples, Physical review A, vol.45, issue.8, p.6056, 1992.

L. H. Timothy, A. Watkin, M. Rau, and . Biehl, The statistical mechanics of learning a rule, Reviews of Modern Physics, vol.65, issue.2, p.499, 1993.

M. Advani, S. Lahiri, and S. Ganguli, Statistical mechanics of complex neural systems and high dimensional data, Journal of Statistical Mechanics: Theory and Experiment, issue.03, p.3014, 2013.

S. Madhu, A. Advani, and . Saxe, High-dimensional dynamics of generalization error in neural networks, 2017.

B. Aubin, A. Maillard, F. Krzakala, N. Macris, and L. Zdeborová, The committee machine: Computational to statistical gaps in learning a two-layers neural network, Advances in Neural Information Processing Systems, pp.3223-3234, 2018.
URL : https://hal.archives-ouvertes.fr/cea-01933130

J. Emmanuel, P. Candès, and . Sur, The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression, 2018.

T. Hastie, A. Montanari, S. Rosset, and R. J. Tibshirani, Surprises in high-dimensional ridgeless least squares interpolation, 2019.

S. Mei and A. Montanari, The generalization error of random features regression: Precise asymptotics and double descent curve, 2019.

S. Goldt, M. Mézard, F. Krzakala, and L. Zdeborová, Modelling the in uence of data structure on learning in neural networks, 2019.

M. Mézard, Mean-eld message-passing equations in the hop eld model and its generalizations, Physical Review E, vol.95, issue.2, p.22117, 2017.

L. Chizat, E. Oyallon, and F. Bach, On lazy training in di erentiable programming, Advances in Neural Information Processing Systems, vol.32, pp.2933-2943, 2019.

A. Jacot, F. Gabriel, and C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, Advances in neural information processing systems, pp.8571-8580, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01824549

M. Geiger, S. Spigler, A. Jacot, and M. Wyart, Disentangling feature and lazy learning in deep neural networks: an empirical study, 2019.

C. Louart, Z. Liao, and R. Couillet, A random matrix approach to neural networks, The Annals of Applied Probability, vol.28, issue.2, pp.1190-1248, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01957656

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems, vol.20, pp.1177-1184, 2008.

M. Mézard, G. Parisi, and M. Virasoro, Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, World Scienti c Publishing Company, vol.9, 1987.

A. Engel and C. Van-den-broeck, Statistical mechanics of learning, 2001.

L. Zdeborová and F. Krzakala, Statistical physics of inference: Thresholds and algorithms, Advances in Physics, vol.65, issue.5, pp.453-552, 2016.

M. Talagrand, Annals of mathematics, pp.221-263, 2006.

J. Barbier, F. Krzakala, N. Macris, L. Miolane, and L. Zdeborová, Optimal errors and phase transitions in high-dimensional generalized linear models, Proceedings of the National Academy of Sciences, vol.116, pp.5451-5460, 2019.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014.

P. Diederik, M. Kingma, and . Welling, , 2013.

M. E. , A. Seddik, C. Louart, M. Tamaazousti, and R. Couillet, Random matrix theory proves that deep learning representations of gan-data behave as gaussian mixtures, 2020.

X. Simon-s-du, B. Zhai, A. Poczos, and . Singh, Gradient descent provably optimizes overparameterized neural networks, 2018.

Z. Allen-zhu, Y. Li, and Z. Song, A convergence theory for deep learning via overparameterization, International Conference on Machine Learning, pp.242-252, 2019.

B. Woodworth, S. Gunasekar, J. Lee, D. Soudry, and N. Srebro, Kernel and deep regimes in overparametrized models, 2019.

Q. Le, T. Sarlós, and A. Smola, Fastfood-approximating kernel expansions in loglinear time, Proceedings of the international conference on machine learning, vol.85, 2013.

M. Moczulski, M. Denil, J. Appleyard, and N. De-freitas, ACDC: A structured e cient linear layer, 2015.

A. Montanari, F. Ruan, Y. Sohn, and J. Yan, The generalization error of max-margin linear classi ers, High-dimensional asymptotics in the overparametrized regime, 2019.

M. Belkin, D. Hsu, S. Ma, and S. Mandal, Reconciling modern machine-learning practice and the classical bias-variance trade-o, Proceedings of the National Academy of Sciences, vol.116, pp.15849-15854, 2019.

S. Spigler, . Geiger, L. Ascoli, G. Sagun, M. Biroli et al., A jamming transition from under-to over-parametrization a ects generalization in deep learning, Journal of Physics A: Mathematical and Theoretical, vol.52, issue.47, p.474001, 2019.

M. Krzysztof, M. Choromanski, A. Rowland, and . Weller, The unreasonable e ectiveness of structured random orthogonal embeddings, Advances in Neural Information Processing Systems, pp.219-228, 2017.

E. Gardner and B. Derrida, Three un nished works on the optimal storage capacity of networks, Journal of Physics A: Mathematical and General, vol.22, issue.12, 1983.

Y. Kabashima, T. Wadayama, and T. Tanaka, A typical reconstruction limit for compressed sensing based on lp-norm minimization, Journal of Statistical Mechanics: Theory and Experiment, issue.09, p.9003, 2009.

F. Krzakala, M. Mézard, F. Sausset, L. Sun, and . Zdeborová, Statistical-physics-based reconstruction in compressed sensing, Physical Review X, vol.2, issue.2, p.21005, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00716897

X. Cheng and A. Singer, The spectrum of random inner-product kernel matrices, Random Matrices: Theory and Applications, vol.02, p.1350010, 2013.

J. Rey-pennington and P. Worah, Nonlinear random matrix theory for deep learning, Advances in Neural Information Processing Systems, vol.30, pp.2637-2646, 2017.

, Kernel random matrices of large concentrated data: the example of gan-generated images, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7480-7484, 2019.

B. Neal, S. Mittal, A. Baratin, V. Tantia, and M. Scicluna, Simon Lacoste-Julien, and Ioannis Mitliagkas. A modern take on the bias-variance tradeo in neural networks, 2018.

M. Geiger, A. Jacot, S. Spigler, F. Gabriel, L. Sagun et al., Scaling description of generalization with number of parameters in deep learning, 2019.

P. Nakkiran, G. Kaplun, Y. Bansal, T. Yang, B. Barak et al., Deep double descent: Where bigger models and more data hurt, vol.2020, 2019.

S. Geman, E. Bienenstock, and R. Doursat, Neural networks and the bias/variance dilemma, Neural computation, vol.4, issue.1, pp.1-58, 1992.

L. Breiman, Re ections after refereeing papers for nips. The Mathematics of Generalization, pp.11-15, 1995.

M. Opper and W. Kinzel, Statistical mechanics of generalization, Models of neural networks III, pp.151-209, 1996.

M. Thomas and . Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE transactions on electronic computers, issue.3, pp.326-334, 1965.

B. Schölkopf, A. J. Smola, and F. Bach, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2002.

A. Rudi, L. Carratino, and L. Rosasco, Falkon: An optimal large scale kernel method, Advances in Neural Information Processing Systems, pp.3888-3898, 2017.

A. Caponnetto and E. D. Vito, Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.

Y. Zhang, J. Duchi, and M. Wainwright, Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, The Journal of Machine Learning Research, vol.16, issue.1, pp.3299-3340, 2015.

A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Drémeau et al., Random projections through multiple optical scattering: Approximating kernels at the speed of light, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6215-6219, 2016.

R. Ohana, J. Wacker, J. Dong, S. Marmin, F. Krzakala et al., Kernel computations from large-scale random features obtained by optical processing units, 2019.

A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt, Practical and optimal lsh for angular distance, Advances in neural information processing systems, pp.1225-1233, 2015.

M. Bojarski, A. Choromanska, K. Choromanski, F. Fagan, C. Gouy-pailler et al., Structured adaptive and random spinners for fast machine learning computations, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02010086

W. Hachem, P. Loubaton, and J. Najim, Deterministic equivalents for certain functionals of large random matrices, Ann. Appl. Probab, vol.17, issue.3, pp.875-930, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00621793

Z. Fan and A. Montanari, The spectral norm of random inner-product kernel matrices. Probability Theory and Related Fields, vol.173, pp.27-85, 2019.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller et al., API design for machine learning software: experiences from the scikit-learn project, ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp.108-122, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00856511