Passer à la navigation principale Passer à la recherche Passer au contenu principal

DSD2: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

  • Institut Polytechnique de Paris

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.

langue originaleAnglais
Pages (de - à)14749-14757
Nombre de pages9
journalProceedings of the AAAI Conference on Artificial Intelligence
Volume38
Numéro de publication13
Les DOIs
étatPublié - 25 mars 2024
Evénement38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Durée: 20 févr. 202427 févr. 2024

SDG des Nations Unies

Ce résultat contribue à ou aux Objectifs de développement durable suivants

  1. SDG 16 - Paix, justice et institutions solides
    SDG 16 Paix, justice et institutions solides

Empreinte digitale

Examiner les sujets de recherche de « DSD2: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free? ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation