FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer
Authors: Shibo Jie, Zhi-Hong Deng
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On VTAB-1K benchmark, our method performs on par with NOAH, the state-of-the-art PETL method, while being 5 more parameter-efficient. We also present a tiny version that only uses 8K (0.01% of Vi T s parameters) trainable parameters but outperforms full fine-tuning and many other PETL methods such as VPT and Bit Fit. In fewshot settings, Fac T also beats all PETL baselines using the fewest parameters, demonstrating its strong capability in the low-data regime. |
| Researcher Affiliation | Academia | Shibo Jie, Zhi-Hong Deng* School of Intelligence Science and Technology, Peking University {parsley, zhdeng}@pku.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use VTAB-1K benchmark (Zhai et al. 2019) to evaluate the performance of our methods in terms of PETL. VTAB-1K consists of 19 different visual classification datasets, which can be divided into three groups: Natural, Specialized, and Structured. Each dataset only contains 1,000 training samples. |
| Dataset Splits | No | The paper mentions 1,000 training samples and reports results on 'test sets', but it does not specify a distinct validation set split or percentages for training, validation, and test splits needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' but does not provide specific software dependencies like library names with version numbers (e.g., PyTorch 1.9, Python 3.8, CUDA 11.1). |
| Experiment Setup | Yes | Following Zhang, Zhou, and Liu (2022), we use Adam W optimizer with a learning rate of 1e-3 and batch size of 64 to train for 100 epochs. The hyper-parameter s is roughly swept from {0.01, 0.1, 1, 10, 100}. |