VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Authors: Yang Li, Shaobo Han, Jonathan Shihao Ji

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of VB-Lo RA on natural language understanding, natural language generation, instruction tuning, and mathematical reasoning tasks.
Researcher Affiliation Collaboration Yang Li Dept. of Computer Science Georgia State University Atlanta, GA 30303 yli93@student.gsu.edu; Shaobo Han Optical Networking and Sensing NEC Laboratories America Princeton, NJ 08540 shaobo@nec-labs.com; Shihao Ji School of Computing University of Connecticut Storrs, CT 06269 shihao.ji@uconn.edu
Pseudocode Yes Algorithm 1 Pseudocode of VB-Lo RA in a Py Torch-like style
Open Source Code Yes Our source code is available at https://github.com/leo-yangli/VB-Lo RA.
Open Datasets Yes We adopt the General Language Understanding Evaluation (GLUE) benchmark3 [Wang et al., 2018]; GPT-2 Medium and Large models [Radford et al., 2019] on the E2E dataset4 [Novikova et al., 2017]; Cleaned Alpaca Dataset 5; MT-Bench6 [Zheng et al., 2024]; Meta Math QA8 [Yu et al., 2023] dataset; GSM8K9 [Cobbe et al., 2021]; and MATH10 [Hendrycks et al., 2021] datasets. All are accompanied by citations and/or URLs with licensing information.
Dataset Splits Yes For natural language generation experiments, we fine-tune the GPT-2 Medium and Large models [Radford et al., 2019] on the E2E dataset4 [Novikova et al., 2017], which contains approximately 42,000 training examples, 4,600 validation examples, and 4,600 test examples from the restaurant domain. We adopt the General Language Understanding Evaluation (GLUE) benchmark3 [Wang et al., 2018] to assess the performance of VB-Lo RA across various natural language understanding tasks.
Hardware Specification Yes All our experiments were conducted on a server equipped with 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using a 'Py Torch-like style' for pseudocode and integrating into the 'Py Torch framework', as well as the 'QLo RA framework', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We create a vector bank of 90 vectors of a length of 256, initialized with a uniform distribution U( 0.02, 0.02). The logits are initialized with a normal distribution N(0, 0.01). The learning rates for the vector bank and logit parameters are set to 0.001 and 0.01, respectively. We set the rank to 4 and k = 2 for all our experiments.