TY - GEN
T1 - Mini but Mighty
T2 - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
AU - Marouf, Imad Eddine
AU - Tartaglione, Enzo
AU - Lathuilière, Stéphane
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/3
Y1 - 2024/1/3
N2 - Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via finetuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning.In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.
AB - Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via finetuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning.In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.
KW - Algorithms
KW - Machine learning architectures
KW - and algorithms
KW - formulations
U2 - 10.1109/WACV57701.2024.00175
DO - 10.1109/WACV57701.2024.00175
M3 - Conference contribution
AN - SCOPUS:85190587611
T3 - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
SP - 1721
EP - 1730
BT - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 January 2024 through 8 January 2024
ER -