reproducibilityindex.ai

Unveiling LoRA Intrinsic Ranks via Salience Analysis

Authors: Wenjun Ke, Jiahao Wang, Peng Wang, Jiajun Liu, Dong Nie, Guozheng Li, Yining Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the generality of our method across various tasks, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large model instruction tuning tasks. Experimental results demonstrate the superiority of Salient Lo RA, which outperforms state-of-the-art methods by 0.96%-3.56% on multiple datasets.
Researcher Affiliation	Collaboration	Wenjun Ke1,2, Jiahao Wang1, Peng Wang1,2*, Jiajun Liu1, Dong Nie3, Guozheng Li1, and Yining Li1 1School of Computer Science and Engineering, Southeast University 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education 3Meta Inc.
Pseudocode	Yes	Algorithm 1 De-Cycling Algorithm for Dependency Graphs
Open Source Code	Yes	The code is publicly available at https://github.com/Heyest/SalientLoRA.
Open Datasets	Yes	To evaluate the applicability of our fine-tuning approach across multiple tasks and various models, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large-scale model instruction fine-tuning tasks, respectively. The specific datasets chosen for each task and the statistics will be detailed in Section 6.2, 6.3, 6.4 and Appendix A. (Refers to GLUE, XSum, CNN, MT datasets which are well-known public benchmarks. Appendix A provides statistics tables for GLUE, XSum, CNN/Daily Mail datasets).
Dataset Splits	Yes	Datasets To evaluate the applicability of our fine-tuning approach across multiple tasks and various models, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large-scale model instruction fine-tuning tasks, respectively. The specific datasets chosen for each task and the statistics will be detailed in Section 6.2, 6.3, 6.4 and Appendix A. (Appendix A, Table 5 shows '#Train #Test #Dev' columns for GLUE benchmark datasets. Table 6 shows '#Train #Test #Dev' columns for XSum and CNN/Daily Mail datasets).
Hardware Specification	Yes	Our experiments are conducted on four NVIDIA RTX 3090Ti GPUs for NLU and NVIDIA Ampere A100 for NLG and instruction tuning tasks.
Software Dependencies	No	The paper mentions implementing their approach and running experiments but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries).
Experiment Setup	Yes	During salience measurement, the slope threshold for dependency calculation γ = 2, the correlation threshold β = 0.9, and the λ is set to 0.7. For adaptive time-window adjustment, the initial time window size Ti = 10, the final time window size Tf = 200, and the initial total rank ri is set to 7.5 times the target total rank rt. ... We select the learning rate from {8 · 10−5, 5 · 10−5, 3 · 10−5, 1 · 10−4, 3 · 10−4, 5 · 10−4, 8 · 10−4, 1 · 10−3}, and pick the best-performing learning rate for every method. Further details on other hyperparameters are shown in Appendix C. (Appendix C, Table 9 provides detailed hyperparameters for each task including Learning Rate, Batch Size, Epochs, rt, ni, nf).