LoRA+: Efficient Low Rank Adaptation of Large Models
Authors: Soufiane Hayou, Nikhil Ghosh, Bin Yu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our extensive experiments, Lo RA+ improves performance (1% 2% improvements) and finetuning speed (up to 2X Speed Up), at the same computational cost as Lo RA. and Our theory is validated with extensive empirical results with different language of models and tasks. |
| Researcher Affiliation | Academia | *Equal contribution 1Simons Institute, UC Berkeley 2Department of Statistics, UC Berkeley. Correspondence to: Soufiane Hayou <hayou@berkeley.edu>, Nikhil Ghosh <nikhil_ghosh@berkeley.edu>. |
| Pseudocode | No | The paper describes mathematical derivations and methodological steps but does not include any formally labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | The code for our experiments is available at https://github. com/nikhil-ghosh-berkeley/loraplus. |
| Open Datasets | Yes | We report our empirical results using Lo RA to finetune a set of language models on different benchmarks. and The GLUE benchmark (General Language Understanding Evaluation)... and finetune the Llama-7b model (Touvron et al., 2023) on the MNLI dataset and flan-v2 dataset (Longpre et al., 2023) using Lo RA. and The model is trained on a synthetic dataset generated with X N(0, Id), Y = sin(d 1 Pd i=1 Xi). See Appendix C for more details. |
| Dataset Splits | No | The paper mentions using well-known benchmarks (GLUE, MNLI, Flan-v2) which typically have predefined splits, and specifies train/test sizes for a toy model ('train data size 1000 and a test data size 100'). However, it does not explicitly state the percentage or sample counts for training, validation, and test splits for all experiments, nor does it explicitly state the use of standard validation splits. |
| Hardware Specification | Yes | GPUs. Nvidia V100, Nvidia A10. |
| Software Dependencies | No | The paper mentions training algorithms like 'Adam W' but does not specify software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., PyTorch, TensorFlow) with their versions, or other specific libraries and their versions. |
| Experiment Setup | Yes | Learning rate grid. ηA {4e-3, 2e-3, 1e-3, 5e-4, 2e-4, 1e-4}, ηB { 8e-4, 4e-4, 2e-4, 1e-4, 5e-5, 2e-5, 1e-5 }. and Other Hyperparameters. Sequence length T = 128, train batch size bs = 32, number of train epochs E = 3 (E = 10 for SST2), number of random seeds s = 3. |