Passer à la navigation principale Passer à la recherche Passer au contenu principal

On the spectral bias of two-layer linear networks

  • EPFL
  • Courant Institute of Mathematical Sciences

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

This paper studies the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization process carries an implicit bias on the parameters that depends on the scale of its initialization. The main result of the paper is a variational characterization of the loss minimizers retrieved by the gradient flow for a specific initialization shape. This characterization reveals that, in the small scale initialization regime, the linear neural network's hidden layer is biased toward having a low-rank structure. To complement our results, we showcase a hidden mirror flow that tracks the dynamics of the singular values of the weights matrices and describe their time evolution. We support our findings with numerical experiments illustrating the phenomena.

langue originaleAnglais
titreAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
rédacteurs en chefA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
EditeurNeural information processing systems foundation
ISBN (Electronique)9781713899921
étatPublié - 1 janv. 2023
Modification externeOui
Evénement37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, États-Unis
Durée: 10 déc. 202316 déc. 2023

Série de publications

NomAdvances in Neural Information Processing Systems
Volume36
ISSN (imprimé)1049-5258

Une conférence

Une conférence37th Conference on Neural Information Processing Systems, NeurIPS 2023
Pays/TerritoireÉtats-Unis
La villeNew Orleans
période10/12/2316/12/23

Empreinte digitale

Examiner les sujets de recherche de « On the spectral bias of two-layer linear networks ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation