reproducibilityindex.ai

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Authors: Vijay Chandra Lingam, Atula Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, Sujay Sanghavi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that achieve only up to 85% performance with 0.03 to 0.8% of the trainable parameter budget.
Researcher Affiliation	Academia	University of Texas at Austin University of Cologne CISPA Helmholtz Center for Information Security
Pseudocode	No	The paper describes the SVFT formulation mathematically and with diagrams, but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	2Code is available at https://github.com/Vijay Lingam95/SVFT/
Open Datasets	Yes	Language. For natural language generation (NLG) tasks, we evaluate on GSM-8K [7] and MATH [12] by fine-tuning on Meta Math QA-40K [35]... We also evaluate on 8 commonsense reasoning benchmarks (Bool Q [5], PIQA [3], SIQA [30], Hella Swag [36], Winogrande [29], ARC-easy/challenge [6], and Open Book QA [23])... For natural language understanding (NLU), we evaluate on the General Language Understanding Evaluation (GLUE) benchmark... Vision. Our experiments on vision tasks consist of 4 benchmarks: CIFAR-100 [18], Food101 [4], RESISC45 [33], and Flowers102 [24].
Dataset Splits	Yes	For each dataset in the vision tasks, we train on 10 samples per class, using 2 examples per class for validation, and test on the full test set.
Hardware Specification	Yes	All of our experiments are conducted on a Linux machine (Debian GNU) with the following specifications: 2 A100 80 GB, Intel Xeon CPU @ 2.20GHz with 12 cores, and 192 GB RAM.
Software Dependencies	No	The paper mentions 'mixed weight precision (e.g., bfloat16)' but does not specify key software components like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with their specific version numbers.
Experiment Setup	Yes	The complete details of our experimental setup and hyperparameter configurations are provided in Appendix C. Baselines. We compare with Full Fine-Tuning (FT) updating all learnable parameters in all layers, along with Lo RA [15], Do RA [19], BOFT [20] and Ve RA [17].4 Target Modules. We adapt all weight matrices for SVFT, as it does not increase trainable parameters at the same rate as baseline methods. For baselines, we adapt the target modules recommended in [19]: QKVUD matrices for Lo RA and Do RA, compatible matrices for Ve RA, and QV matrices for BOFT to stay within GPU memory limits. Additional details can be found in Appendix C.7 and C.8.