Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
Authors: Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on 26 classification datasets in total and 3 robustness & out-of-distribution datasets. SSF obtains state-of-the-art performance compared to other parameter-efficient fine-tuning methods with the trainable parameters and accuracy trade-off (Table 1 and Figure 1). |
| Researcher Affiliation | Collaboration | Dongze Lian1 Daquan Zhou1,2 Jiashi Feng2 Xinchao Wang1 1National University of Singapore 2Byte Dance |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/dongzelian/SSF. |
| Open Datasets | Yes | We mainly conduct our experiments on a series of datasets that can be categorized into three types as detailed below: FGVC: CUB-200-2011 [57], NABirds [55], Oxford Flowers [44], Stanford Dogs [30] and Stanford Cars [12]. VTAB-1k benchmark is introduced in [67]. General Image Classification Datasets: CIFAR-100 [31] and Image Net-1K [7]. |
| Dataset Splits | Yes | Image Net-1K contains 1.28M training images and 50K validation images with 1,000 categories, which are very large datasets for object recognition. |
| Hardware Specification | Yes | All running results in Figure 3 are measured in a single Ge Force RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam W [41] optimizer' and 'mixed precision training' but does not provide specific version numbers for any software dependencies like programming languages, frameworks, or libraries. |
| Experiment Setup | Yes | We employ the Adam W [41] optimizer to fine-tune models for 100 epochs for CIFAR-100, and 30 epochs for Image Net-1K. The cosine decay strategy is adopted for the learning rate schedule, and the linear warm-up is used in the first 10 epochs for CIFAR-100 and 5 epochs for Image Net-1K. We employ a batch size of 16 for the training stage and inference stage, and use mixed precision training. |