Y. Lecun, Y. Bengio, and G. E. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Humanlevel control through deep reinforcement learning, Nature, vol.518, issue.7540, pp.529-533, 2015.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016.

K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, International Conference on Learning Representations, 2015.

P. L. Bartlett and S. Mendelson, Rademacher and Gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, issue.3, pp.463-482, 2003.

M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.

B. Neyshabur, R. Tomioka, and N. Srebro, Norm-Based Capacity Control in Neural Networks, Conference on Learning Theory, 2015.

N. Golowich, A. Rakhlin, and O. Shamir, Size-Independent Sample Complexity of Neural Networks, 2017.

G. K. Dziugaite and D. M. Roy, Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data, Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, 2017.

S. Arora, R. Ge, B. Neyshabur, and Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, 2018.

Z. Allen-zhu, Y. Li, and Y. Liang, Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers, 2018.

B. Neyshabur, R. Tomioka, and N. Srebro, search of the real inductive bias: On the role of implicit regularization in deep learning. In ICLR, 2015.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2017.

D. Arpit, S. Jastrz, M. S. Kanwal, T. Maharaj, A. Fischer et al., A Closer Look at Memorization in Deep Networks, Proceedings of the 34th International Conference on Machine Learning, 2017.

M. Biehl and H. Schwarze, Learning by on-line gradient descent, J. Phys. A. Math. Gen, vol.28, issue.3, pp.643-656, 1995.
DOI : 10.1088/0305-4470/28/3/018

D. Saad and S. A. Solla, Exact Solution for On-Line Learning in Multilayer Neural Networks, Phys. Rev. Lett, vol.74, issue.21, pp.4337-4340, 1995.

D. Saad and S. A. Solla, On-line learning in soft committee machines, Phys. Rev. E, vol.52, issue.4, pp.4225-4243, 1995.

E. Gardner and B. Derrida, Three unfinished works on the optimal storage capacity of networks, Journal of Physics A: Mathematical and General, vol.22, issue.12, pp.1983-1994, 1989.

H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Physical Review A, vol.45, issue.8, pp.6056-6091, 1992.

T. L. Watkin, A. Rau, and M. Biehl, The statistical mechanics of learning a rule, Reviews of Modern Physics, vol.65, issue.2, pp.499-556, 1993.

A. Engel and C. Van-den-broeck, Statistical Mechanics of Learning, 2001.

L. Zdeborová and F. Krzakala, Statistical physics of inference: thresholds and algorithms, Adv. Phys, vol.65, issue.5, pp.453-552, 2016.

M. S. Advani and S. Ganguli, Statistical mechanics of optimal convex inference in high dimensions, Physical Review X, vol.6, issue.3, pp.1-16, 2016.

P. Chaudhari, A. Choromanska, S. Soatto, Y. Lecun, C. Baldassi et al., Entropy-SGD: Biasing Gradient Descent Into Wide Valleys, 2017.

M. Advani and A. M. Saxe, High-dimensional dynamics of generalization error in neural networks, 2017.

B. Aubin, A. Maillard, J. Barbier, F. Krzakala, N. Macris et al., The committee machine: Computational to statistical gaps in learning a two-layers neural network, Advances in Neural Information Processing Systems, vol.31, pp.3227-3238, 2018.
URL : https://hal.archives-ouvertes.fr/cea-01933130

M. Baity-jesi, L. Sagun, M. Geiger, S. Spigler, G. B. Arous et al., Comparing Dynamics: Deep Neural Networks versus Glassy Systems, Proceedings of the 35th International Conference on Machine Learning, 2018.

H. Schwarze, Learning a rule in a multilayer neural network, Journal of Physics A: Mathematical and General, vol.26, issue.21, pp.5781-5794, 1993.
DOI : 10.1088/0305-4470/26/21/017

G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control. Signals Syst, vol.2, issue.4, pp.303-314, 1989.

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989.
DOI : 10.1016/0893-6080(89)90020-8

S. Mei, A. Montanari, and P. Nguyen, A mean field view of the landscape of two-layer neural networks, Proceedings of the National Academy of Sciences, vol.115, issue.33, pp.7665-7671, 2018.

G. M. Rotskoff and E. Vanden-eijnden, Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks, Advances in neural information processing systems 31, pp.7146-7155, 2018.

L. Chizat and F. Bach, On the global convergence of gradient descent for over-parameterized models using optimal transport, Advances in Neural Information Processing Systems, vol.31, pp.3040-3050, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01798792

Y. Li and Y. Liang, Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data, Advances in Neural Information Processing Systems, vol.31, 2018.

W. Kinzel and P. Ruján, Improving a Network Generalization Ability by Selecting Examples, Europhysics Letters), vol.13, issue.5, pp.473-477, 1990.

C. W. Mace and A. C. Coolen, Statistical mechanical analysis of the dynamics of learning in perceptrons, Statistics and Computing, vol.8, issue.1, pp.55-88, 1998.

D. Saad and S. A. Solla, Learning with Noise and Regularizers Multilayer Neural Networks, Advances in Neural Information Processing Systems 9, pp.260-266, 1997.

E. Oja and J. Karhunen, On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix, Journal of Mathematical Analysis and Applications, vol.106, issue.1, pp.69-84, 1985.

C. Wang, J. Mattingly, and Y. M. Lu, Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA, 2017.

C. Wang, H. Hu, and Y. M. Lu, A Solvable High-Dimensional Model of GAN, 2018.

A. Brutzkus, A. Globerson, E. Malach, and S. Shalev-shwartz, SGD learns overparameterized networks that provably generalize on linearly separable data, International Conference on Learning Representations, 2018.

M. Soltanolkotabi, A. Javanmard, and J. D. Lee, Theoretical insights into the optimization landscape of over-parameterized shallow neural networks, IEEE Transactions on Information Theory, vol.65, issue.2, pp.742-769, 2018.

A. Krogh and J. A. Hertz, Generalization in a linear perceptron in the presence of noise, Journal of Physics A: Mathematical and General, vol.25, issue.5, pp.1135-1147, 1992.

A. M. Saxe, J. L. Mcclelland, and S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, In ICLR, 2014.

A. K. Lampinen and S. Ganguli, An analytic theory of generalization dynamics and transfer learning in deep linear networks, International Conference on Learning Representations, 2019.