Generalisation dynamics of online learning in over-parameterised neural networks

Sebastian Goldt; Madhu S. Advani; Andrew M. Saxe; Florent Krzakala; Lenka Zdeborova

Pré-Publication, Document De Travail Année : 2019

Generalisation dynamics of online learning in over-parameterised neural networks

(1) , (2) , (3) , (4) , (1)

1
2
3
4

Sebastian Goldt

Fonction : Auteur
PersonId : 738238
IdHAL : sebastian-goldt

Institut de Physique Théorique - UMR CNRS 3681

Madhu S. Advani

Fonction : Auteur

Harvard University

Andrew M. Saxe

Fonction : Auteur

Department of Experimental Psychology

Florent Krzakala

Fonction : Auteur
PersonId : 1179607
ORCID : 0000-0003-2313-2578
IdRef : 070360715

Laboratoire de Physique Statistique de l'ENS

Lenka Zdeborova

Fonction : Auteur
PersonId : 1234977
ORCID : 0000-0002-8377-3978
IdRef : 128058153

Institut de Physique Théorique - UMR CNRS 3681

Résumé

Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data. Here we study the generalisation dynamics of two-layer neural networks in a teacher-student setup, where one network, the student, is trained using stochastic gradient descent (SGD) on data generated by another network, called the teacher. We show how for this problem, the dynamics of SGD are captured by a set of differential equations. In particular, we demonstrate analytically that the generalisation error of the student increases linearly with the network size, with other relevant parameters held constant. Our results indicate that achieving good generalisation in neural networks depends on the interplay of at least the algorithm, its learning rate, the model architecture, and the data set.

Domaines

Physique [physics]

Fichier principal

generalisation.pdf (750.7 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuelle De Laborderie : Connectez-vous pour contacter le contributeur

https://cea.hal.science/cea-02009764

Soumis le : mercredi 6 février 2019-15:31:44

Dernière modification le : mercredi 3 avril 2024-11:08:06

Archivage à long terme le : mardi 7 mai 2019-16:02:33

Dates et versions

cea-02009764 , version 1 (06-02-2019)

Identifiants

HAL Id : cea-02009764 , version 1
ARXIV : 1901.09085

Citer

Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborova. Generalisation dynamics of online learning in over-parameterised neural networks. 2019. ⟨cea-02009764⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA UNIV-PARIS7 ENS-PARIS LPS CNRS DSM-IPHT CEA-UPSAY PSL USPC UNIV-PARIS-SACLAY CEA-DRF SORBONNE-UNIVERSITE SU-SCIENCES UP-SCIENCES ANR GS-MATHEMATIQUES GS-PHYSIQUE URP-LPS

312 Consultations

100 Téléchargements

Generalisation dynamics of online learning in over-parameterised neural networks

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager