reproducibilityindex.ai

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Authors: Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of Ada Lo RA. Results demonstrate that Ada Lo RA manifests notable improvement over baselines, especially in the low budget settings.
Researcher Affiliation	Collaboration	Georgia Institute of Technology Princeton University Microsoft Azure AI
Pseudocode	Yes	Algorithm 1 Ada Lo RA
Open Source Code	Yes	Our code is publicly available at https://github.com/ Qingru Zhang/Ada Lo RA.
Open Datasets	Yes	We conduct experiments on the General Language Understanding Evaluation (GLUE, Wang et al. 2019) benchmark. ... SQu AD v1.1 (Rajpurkar et al., 2016) and SQu ADv2.0 (Rajpurkar et al., 2018)... XSum (Narayan et al., 2018) and CNN/Daily Mail (Hermann et al., 2015).
Dataset Splits	Yes	Table 5: Summary of the GLUE benchmark. Corpus Task #Train #Dev #Test #Label Metrics
Hardware Specification	Yes	All the experiments are conducted on NVIDIA V100 GPUs.
Software Dependencies	No	We use Py Torch (Paszke et al., 2019) to implement all the algorithms. Our implementation is based on the publicly available Huggingface Transformers3 (Wolf et al., 2019) code-base. The paper mentions PyTorch and Huggingface Transformers but does not specify version numbers for these software components.
Experiment Setup	Yes	We compare Ada Lo RA with the baselines under different budget levels, for example, given the total trainable parameters as 0.3/0.6/1.2 million. In order to match the parameter budget, we select the hidden dimensions of adapters from {8, 16, 32, 64}, set the rank r of Lo RA as {2, 4, 8}, and choose the final budget b(T ) of Ada Lo RA from {144, 288, 576}. Then we set b(0) as 1.5 times of b(T ) for Ada Lo RA and select the regularization coefficient γ from {0.1, 0.3, 0.5}. We set the exponential moving average parameters β1 and β2 as their default value 0.85. We select the learning rate from {5 10 5, 8 10 5, 1 10 4, 2 10 4}. More details are presented in Appendix E. ... We set the batch size as 16.