The Expressive Power of Low-Rank Adaptation
Authors: Yuchen Zeng, Kangwook Lee
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work pioneers the theoretical analysis of Lo RA fine-tuning s expressive capabilities in FNNs and TFNs, offering novel insights into how rank, model depth, and proximity to the target model influence Lo RA s effectiveness. Our theoretical findings are validated by empirical evidence. |
| Researcher Affiliation | Academia | Yuchen Zeng Department of Computer Science University of Wisconsin-Madison yzeng58@wisc.edu Kangwook Lee Department of Electrical and Computer Engineering University of Wisconsin-Madison kangwook.lee@wisc.edu |
| Pseudocode | No | The paper describes its methods and proofs in mathematical and prose format but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures. |
| Open Source Code | Yes | REPRODUCIBILITY STATEMENT The code for all experiments reported in this paper is publicly accessible. For the purpose of reproducibility, the code can be found at the following anonymized Git Hub repository: https: //github.com/UW-Madison-Lee-Lab/Expressive_Power_of_Lo RA. |
| Open Datasets | Yes | We perform experiments on both synthetic and real datasets to substantiate our theoretical results... We also conduct experiments on real datasets to further support our theoretical insights in real-world scenarios... GLUE benchmark (Wang et al., 2018). |
| Dataset Splits | Yes | The optimal configuration is determined based on the validation loss on a set of 256 samples independently drawn from a standard normal distribution. |
| Hardware Specification | Yes | Our experiments are conducted using Tesla V100-PCIE-16GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-40GB, and NVIDIA L40 GPUs. |
| Software Dependencies | No | The paper mentions using "Py Torch" for initialization and "Adam optimizer" but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We utilize the Adam optimizer. We tune the learning rate 10^-2, 10^-3, 10^-4 and the weight decay 0, 10^-2, 10^-3, 10^-4 . The optimal configuration is determined based on the validation loss on a set of 256 samples independently drawn from a standard normal distribution. We run 5,000 iterations for each hyperparameter setting, where at each step 256 fresh standard Gaussian samples are generated for loss and gradient computation. |