Passer à la navigation principale Passer à la recherche Passer au contenu principal

Dodging the Double Descent in Deep Neural Networks

  • Institut Polytechnique de Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the "double descent", has caught the attention of the deep learning community. As the model's size grows, the performance gets first worse and then goes back to improving. It raises serious questions about the optimal model's size to maintain high generalization: the model needs to be sufficiently over-parametrized, but adding too many parameters wastes training resources. Is it possible to find, in an efficient way, the best trade-off?Our work shows that the double descent phenomenon is potentially avoidable with proper conditioning of the learning problem, but a final answer is yet to be found. We empirically observe that there is hope to dodge the double descent in complex scenarios with proper regularization, as a simple ℓ2 regularization is already positively contributing to such a perspective.

langue originaleAnglais
titre2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
EditeurIEEE Computer Society
Pages1625-1629
Nombre de pages5
ISBN (Electronique)9781728198354
Les DOIs
étatPublié - 1 janv. 2023
Evénement30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, Malaisie
Durée: 8 oct. 202311 oct. 2023

Série de publications

NomProceedings - International Conference on Image Processing, ICIP
ISSN (imprimé)1522-4880

Une conférence

Une conférence30th IEEE International Conference on Image Processing, ICIP 2023
Pays/TerritoireMalaisie
La villeKuala Lumpur
période8/10/2311/10/23

Empreinte digitale

Examiner les sujets de recherche de « Dodging the Double Descent in Deep Neural Networks ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation