Skip to main navigation Skip to search Skip to main content

Sparse Double Descent in Vision Transformers: Real or Phantom Threat?

  • Institut Polytechnique de Paris

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a “sparse double descent” phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned $$\ell _2$$ regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.

Original languageEnglish
Title of host publicationImage Analysis and Processing – ICIAP 2023 - 22nd International Conference, ICIAP 2023, Proceedings
EditorsGian Luca Foresti, Andrea Fusiello, Edwin Hancock
PublisherSpringer Science and Business Media Deutschland GmbH
Pages490-502
Number of pages13
ISBN (Print)9783031431524
DOIs
Publication statusPublished - 1 Jan 2023
Event22nd International Conference on Image Analysis and Processing, ICIAP 2023 - Udine, Italy
Duration: 11 Sept 202315 Sept 2023

Publication series

NameLecture Notes in Computer Science
Volume14234 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Image Analysis and Processing, ICIAP 2023
Country/TerritoryItaly
CityUdine
Period11/09/2315/09/23

Keywords

  • Sparse double descent
  • deep learning
  • pruning
  • transformers

Fingerprint

Dive into the research topics of 'Sparse Double Descent in Vision Transformers: Real or Phantom Threat?'. Together they form a unique fingerprint.

Cite this