reproducibilityindex.ai

LoRA+: Efficient Low Rank Adaptation of Large Models

Authors: Soufiane Hayou, Nikhil Ghosh, Bin Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our extensive experiments, Lo RA+ improves performance (1% 2% improvements) and finetuning speed (up to 2X Speed Up), at the same computational cost as Lo RA. and Our theory is validated with extensive empirical results with different language of models and tasks.
Researcher Affiliation	Academia	*Equal contribution 1Simons Institute, UC Berkeley 2Department of Statistics, UC Berkeley. Correspondence to: Soufiane Hayou <hayou@berkeley.edu>, Nikhil Ghosh <nikhil_ghosh@berkeley.edu>.
Pseudocode	No	The paper describes mathematical derivations and methodological steps but does not include any formally labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	The code for our experiments is available at https://github. com/nikhil-ghosh-berkeley/loraplus.
Open Datasets	Yes	We report our empirical results using Lo RA to finetune a set of language models on different benchmarks. and The GLUE benchmark (General Language Understanding Evaluation)... and finetune the Llama-7b model (Touvron et al., 2023) on the MNLI dataset and flan-v2 dataset (Longpre et al., 2023) using Lo RA. and The model is trained on a synthetic dataset generated with X N(0, Id), Y = sin(d 1 Pd i=1 Xi). See Appendix C for more details.
Dataset Splits	No	The paper mentions using well-known benchmarks (GLUE, MNLI, Flan-v2) which typically have predefined splits, and specifies train/test sizes for a toy model ('train data size 1000 and a test data size 100'). However, it does not explicitly state the percentage or sample counts for training, validation, and test splits for all experiments, nor does it explicitly state the use of standard validation splits.
Hardware Specification	Yes	GPUs. Nvidia V100, Nvidia A10.
Software Dependencies	No	The paper mentions training algorithms like 'Adam W' but does not specify software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., PyTorch, TensorFlow) with their versions, or other specific libraries and their versions.
Experiment Setup	Yes	Learning rate grid. ηA {4e-3, 2e-3, 1e-3, 5e-4, 2e-4, 1e-4}, ηB { 8e-4, 4e-4, 2e-4, 1e-4, 5e-5, 2e-5, 1e-5 }. and Other Hyperparameters. Sequence length T = 128, train batch size bs = 32, number of train epochs E = 3 (E = 10 for SST2), number of random seeds s = 3.