Abstract
Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.
| Original language | English |
|---|---|
| Pages (from-to) | 14749-14757 |
| Number of pages | 9 |
| Journal | Proceedings of the AAAI Conference on Artificial Intelligence |
| Volume | 38 |
| Issue number | 13 |
| DOIs | |
| Publication status | Published - 25 Mar 2024 |
| Event | 38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 16 Peace, Justice and Strong Institutions
Fingerprint
Dive into the research topics of 'DSD2: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver