Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Authors: Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of Ada Lo RA. Results demonstrate that Ada Lo RA manifests notable improvement over baselines, especially in the low budget settings. |
| Researcher Affiliation | Collaboration | Georgia Institute of Technology Princeton University Microsoft Azure AI |
| Pseudocode | Yes | Algorithm 1 Ada Lo RA |
| Open Source Code | Yes | Our code is publicly available at https://github.com/ Qingru Zhang/Ada Lo RA. |
| Open Datasets | Yes | We conduct experiments on the General Language Understanding Evaluation (GLUE, Wang et al. 2019) benchmark. ... SQu AD v1.1 (Rajpurkar et al., 2016) and SQu ADv2.0 (Rajpurkar et al., 2018)... XSum (Narayan et al., 2018) and CNN/Daily Mail (Hermann et al., 2015). |
| Dataset Splits | Yes | Table 5: Summary of the GLUE benchmark. Corpus Task #Train #Dev #Test #Label Metrics |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA V100 GPUs. |
| Software Dependencies | No | We use Py Torch (Paszke et al., 2019) to implement all the algorithms. Our implementation is based on the publicly available Huggingface Transformers3 (Wolf et al., 2019) code-base. The paper mentions PyTorch and Huggingface Transformers but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We compare Ada Lo RA with the baselines under different budget levels, for example, given the total trainable parameters as 0.3/0.6/1.2 million. In order to match the parameter budget, we select the hidden dimensions of adapters from {8, 16, 32, 64}, set the rank r of Lo RA as {2, 4, 8}, and choose the final budget b(T ) of Ada Lo RA from {144, 288, 576}. Then we set b(0) as 1.5 times of b(T ) for Ada Lo RA and select the regularization coefficient γ from {0.1, 0.3, 0.5}. We set the exponential moving average parameters β1 and β2 as their default value 0.85. We select the learning rate from {5 10 5, 8 10 5, 1 10 4, 2 10 4}. More details are presented in Appendix E. ... We set the batch size as 16. |