reproducibilityindex.ai

The Expressive Power of Low-Rank Adaptation

Authors: Yuchen Zeng, Kangwook Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work pioneers the theoretical analysis of Lo RA ﬁne-tuning s expressive capabilities in FNNs and TFNs, offering novel insights into how rank, model depth, and proximity to the target model inﬂuence Lo RA s effectiveness. Our theoretical ﬁndings are validated by empirical evidence.
Researcher Affiliation	Academia	Yuchen Zeng Department of Computer Science University of Wisconsin-Madison yzeng58@wisc.edu Kangwook Lee Department of Electrical and Computer Engineering University of Wisconsin-Madison kangwook.lee@wisc.edu
Pseudocode	No	The paper describes its methods and proofs in mathematical and prose format but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code	Yes	REPRODUCIBILITY STATEMENT The code for all experiments reported in this paper is publicly accessible. For the purpose of reproducibility, the code can be found at the following anonymized Git Hub repository: https: //github.com/UW-Madison-Lee-Lab/Expressive_Power_of_Lo RA.
Open Datasets	Yes	We perform experiments on both synthetic and real datasets to substantiate our theoretical results... We also conduct experiments on real datasets to further support our theoretical insights in real-world scenarios... GLUE benchmark (Wang et al., 2018).
Dataset Splits	Yes	The optimal conﬁguration is determined based on the validation loss on a set of 256 samples independently drawn from a standard normal distribution.
Hardware Specification	Yes	Our experiments are conducted using Tesla V100-PCIE-16GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-40GB, and NVIDIA L40 GPUs.
Software Dependencies	No	The paper mentions using "Py Torch" for initialization and "Adam optimizer" but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We utilize the Adam optimizer. We tune the learning rate 10^-2, 10^-3, 10^-4 and the weight decay 0, 10^-2, 10^-3, 10^-4 . The optimal conﬁguration is determined based on the validation loss on a set of 256 samples independently drawn from a standard normal distribution. We run 5,000 iterations for each hyperparameter setting, where at each step 256 fresh standard Gaussian samples are generated for loss and gradient computation.