Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

Authors: Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on 26 classification datasets in total and 3 robustness & out-of-distribution datasets. SSF obtains state-of-the-art performance compared to other parameter-efficient fine-tuning methods with the trainable parameters and accuracy trade-off (Table 1 and Figure 1).
Researcher Affiliation Collaboration Dongze Lian1 Daquan Zhou1,2 Jiashi Feng2 Xinchao Wang1 1National University of Singapore 2Byte Dance
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/dongzelian/SSF.
Open Datasets Yes We mainly conduct our experiments on a series of datasets that can be categorized into three types as detailed below: FGVC: CUB-200-2011 [57], NABirds [55], Oxford Flowers [44], Stanford Dogs [30] and Stanford Cars [12]. VTAB-1k benchmark is introduced in [67]. General Image Classification Datasets: CIFAR-100 [31] and Image Net-1K [7].
Dataset Splits Yes Image Net-1K contains 1.28M training images and 50K validation images with 1,000 categories, which are very large datasets for object recognition.
Hardware Specification Yes All running results in Figure 3 are measured in a single Ge Force RTX 2080Ti GPU.
Software Dependencies No The paper mentions using the 'Adam W [41] optimizer' and 'mixed precision training' but does not provide specific version numbers for any software dependencies like programming languages, frameworks, or libraries.
Experiment Setup Yes We employ the Adam W [41] optimizer to fine-tune models for 100 epochs for CIFAR-100, and 30 epochs for Image Net-1K. The cosine decay strategy is adopted for the learning rate schedule, and the linear warm-up is used in the first 10 epochs for CIFAR-100 and 5 epochs for Image Net-1K. We employ a batch size of 16 for the training stage and inference stage, and use mixed precision training.