Mini but Mighty: Finetuning ViTs with Mini Adapters

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via finetuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning.In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1721-1730
Number of pages10
ISBN (Electronic)9798350318920
DOIs
Publication statusPublished - 3 Jan 2024
Event2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 - Waikoloa, United States
Duration: 4 Jan 20248 Jan 2024

Publication series

NameProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024

Conference

Conference2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
Country/TerritoryUnited States
CityWaikoloa
Period4/01/248/01/24

Keywords

  • Algorithms
  • Machine learning architectures
  • and algorithms
  • formulations

Fingerprint

Dive into the research topics of 'Mini but Mighty: Finetuning ViTs with Mini Adapters'. Together they form a unique fingerprint.

Cite this