SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors
Authors: Vijay Chandra Lingam, Atula Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, Sujay Sanghavi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that achieve only up to 85% performance with 0.03 to 0.8% of the trainable parameter budget. |
| Researcher Affiliation | Academia | University of Texas at Austin University of Cologne CISPA Helmholtz Center for Information Security |
| Pseudocode | No | The paper describes the SVFT formulation mathematically and with diagrams, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2Code is available at https://github.com/Vijay Lingam95/SVFT/ |
| Open Datasets | Yes | Language. For natural language generation (NLG) tasks, we evaluate on GSM-8K [7] and MATH [12] by fine-tuning on Meta Math QA-40K [35]... We also evaluate on 8 commonsense reasoning benchmarks (Bool Q [5], PIQA [3], SIQA [30], Hella Swag [36], Winogrande [29], ARC-easy/challenge [6], and Open Book QA [23])... For natural language understanding (NLU), we evaluate on the General Language Understanding Evaluation (GLUE) benchmark... Vision. Our experiments on vision tasks consist of 4 benchmarks: CIFAR-100 [18], Food101 [4], RESISC45 [33], and Flowers102 [24]. |
| Dataset Splits | Yes | For each dataset in the vision tasks, we train on 10 samples per class, using 2 examples per class for validation, and test on the full test set. |
| Hardware Specification | Yes | All of our experiments are conducted on a Linux machine (Debian GNU) with the following specifications: 2 A100 80 GB, Intel Xeon CPU @ 2.20GHz with 12 cores, and 192 GB RAM. |
| Software Dependencies | No | The paper mentions 'mixed weight precision (e.g., bfloat16)' but does not specify key software components like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with their specific version numbers. |
| Experiment Setup | Yes | The complete details of our experimental setup and hyperparameter configurations are provided in Appendix C. Baselines. We compare with Full Fine-Tuning (FT) updating all learnable parameters in all layers, along with Lo RA [15], Do RA [19], BOFT [20] and Ve RA [17].4 Target Modules. We adapt all weight matrices for SVFT, as it does not increase trainable parameters at the same rate as baseline methods. For baselines, we adapt the target modules recommended in [19]: QKVUD matrices for Lo RA and Do RA, compatible matrices for Ve RA, and QV matrices for BOFT to stay within GPU memory limits. Additional details can be found in Appendix C.7 and C.8. |