VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
Authors: Yang Li, Shaobo Han, Jonathan Shihao Ji
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of VB-Lo RA on natural language understanding, natural language generation, instruction tuning, and mathematical reasoning tasks. |
| Researcher Affiliation | Collaboration | Yang Li Dept. of Computer Science Georgia State University Atlanta, GA 30303 yli93@student.gsu.edu; Shaobo Han Optical Networking and Sensing NEC Laboratories America Princeton, NJ 08540 shaobo@nec-labs.com; Shihao Ji School of Computing University of Connecticut Storrs, CT 06269 shihao.ji@uconn.edu |
| Pseudocode | Yes | Algorithm 1 Pseudocode of VB-Lo RA in a Py Torch-like style |
| Open Source Code | Yes | Our source code is available at https://github.com/leo-yangli/VB-Lo RA. |
| Open Datasets | Yes | We adopt the General Language Understanding Evaluation (GLUE) benchmark3 [Wang et al., 2018]; GPT-2 Medium and Large models [Radford et al., 2019] on the E2E dataset4 [Novikova et al., 2017]; Cleaned Alpaca Dataset 5; MT-Bench6 [Zheng et al., 2024]; Meta Math QA8 [Yu et al., 2023] dataset; GSM8K9 [Cobbe et al., 2021]; and MATH10 [Hendrycks et al., 2021] datasets. All are accompanied by citations and/or URLs with licensing information. |
| Dataset Splits | Yes | For natural language generation experiments, we fine-tune the GPT-2 Medium and Large models [Radford et al., 2019] on the E2E dataset4 [Novikova et al., 2017], which contains approximately 42,000 training examples, 4,600 validation examples, and 4,600 test examples from the restaurant domain. We adopt the General Language Understanding Evaluation (GLUE) benchmark3 [Wang et al., 2018] to assess the performance of VB-Lo RA across various natural language understanding tasks. |
| Hardware Specification | Yes | All our experiments were conducted on a server equipped with 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using a 'Py Torch-like style' for pseudocode and integrating into the 'Py Torch framework', as well as the 'QLo RA framework', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We create a vector bank of 90 vectors of a length of 256, initialized with a uniform distribution U( 0.02, 0.02). The logits are initialized with a normal distribution N(0, 0.01). The learning rates for the vector bank and logit parameters are set to 0.001 and 0.01, respectively. We set the rank to 4 and k = 2 for all our experiments. |