Unveiling LoRA Intrinsic Ranks via Salience Analysis
Authors: Wenjun Ke, Jiahao Wang, Peng Wang, Jiajun Liu, Dong Nie, Guozheng Li, Yining Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the generality of our method across various tasks, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large model instruction tuning tasks. Experimental results demonstrate the superiority of Salient Lo RA, which outperforms state-of-the-art methods by 0.96%-3.56% on multiple datasets. |
| Researcher Affiliation | Collaboration | Wenjun Ke1,2, Jiahao Wang1, Peng Wang1,2*, Jiajun Liu1, Dong Nie3, Guozheng Li1, and Yining Li1 1School of Computer Science and Engineering, Southeast University 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education 3Meta Inc. |
| Pseudocode | Yes | Algorithm 1 De-Cycling Algorithm for Dependency Graphs |
| Open Source Code | Yes | The code is publicly available at https://github.com/Heyest/SalientLoRA. |
| Open Datasets | Yes | To evaluate the applicability of our fine-tuning approach across multiple tasks and various models, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large-scale model instruction fine-tuning tasks, respectively. The specific datasets chosen for each task and the statistics will be detailed in Section 6.2, 6.3, 6.4 and Appendix A. (Refers to GLUE, XSum, CNN, MT datasets which are well-known public benchmarks. Appendix A provides statistics tables for GLUE, XSum, CNN/Daily Mail datasets). |
| Dataset Splits | Yes | Datasets To evaluate the applicability of our fine-tuning approach across multiple tasks and various models, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large-scale model instruction fine-tuning tasks, respectively. The specific datasets chosen for each task and the statistics will be detailed in Section 6.2, 6.3, 6.4 and Appendix A. (Appendix A, Table 5 shows '#Train #Test #Dev' columns for GLUE benchmark datasets. Table 6 shows '#Train #Test #Dev' columns for XSum and CNN/Daily Mail datasets). |
| Hardware Specification | Yes | Our experiments are conducted on four NVIDIA RTX 3090Ti GPUs for NLU and NVIDIA Ampere A100 for NLG and instruction tuning tasks. |
| Software Dependencies | No | The paper mentions implementing their approach and running experiments but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries). |
| Experiment Setup | Yes | During salience measurement, the slope threshold for dependency calculation γ = 2, the correlation threshold β = 0.9, and the λ is set to 0.7. For adaptive time-window adjustment, the initial time window size Ti = 10, the final time window size Tf = 200, and the initial total rank ri is set to 7.5 times the target total rank rt. ... We select the learning rate from {8 · 10−5, 5 · 10−5, 3 · 10−5, 1 · 10−4, 3 · 10−4, 5 · 10−4, 8 · 10−4, 1 · 10−3}, and pick the best-performing learning rate for every method. Further details on other hyperparameters are shown in Appendix C. (Appendix C, Table 9 provides detailed hyperparameters for each task including Learning Rate, Batch Size, Epochs, rt, ni, nf). |