The Expressive Power of Low-Rank Adaptation

Authors: Yuchen Zeng, Kangwook Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work pioneers the theoretical analysis of Lo RA fine-tuning s expressive capabilities in FNNs and TFNs, offering novel insights into how rank, model depth, and proximity to the target model influence Lo RA s effectiveness. Our theoretical findings are validated by empirical evidence.
Researcher Affiliation Academia Yuchen Zeng Department of Computer Science University of Wisconsin-Madison yzeng58@wisc.edu Kangwook Lee Department of Electrical and Computer Engineering University of Wisconsin-Madison kangwook.lee@wisc.edu
Pseudocode No The paper describes its methods and proofs in mathematical and prose format but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code Yes REPRODUCIBILITY STATEMENT The code for all experiments reported in this paper is publicly accessible. For the purpose of reproducibility, the code can be found at the following anonymized Git Hub repository: https: //github.com/UW-Madison-Lee-Lab/Expressive_Power_of_Lo RA.
Open Datasets Yes We perform experiments on both synthetic and real datasets to substantiate our theoretical results... We also conduct experiments on real datasets to further support our theoretical insights in real-world scenarios... GLUE benchmark (Wang et al., 2018).
Dataset Splits Yes The optimal configuration is determined based on the validation loss on a set of 256 samples independently drawn from a standard normal distribution.
Hardware Specification Yes Our experiments are conducted using Tesla V100-PCIE-16GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-40GB, and NVIDIA L40 GPUs.
Software Dependencies No The paper mentions using "Py Torch" for initialization and "Adam optimizer" but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We utilize the Adam optimizer. We tune the learning rate 10^-2, 10^-3, 10^-4 and the weight decay 0, 10^-2, 10^-3, 10^-4 . The optimal configuration is determined based on the validation loss on a set of 256 samples independently drawn from a standard normal distribution. We run 5,000 iterations for each hyperparameter setting, where at each step 256 fresh standard Gaussian samples are generated for loss and gradient computation.