reproducibilityindex.ai

Bayesian Low-rank Adaptation for Large Language Models

Authors: Adam X. Yang, Maxime Robeyns, Xi Wang, Laurence Aitchison

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assessed the efficacy of Laplace-Lo RA by evaluating the negative log-likelihood and expected calibration error of Lla MA2-7B during fine-tuning on common-sense reasoning tasks.
Researcher Affiliation	Academia	1University of Bristol 2University of Massachusetts, Amherst {adam.yang,maxime.robeyns.2018,laurence.aitchison}@bristol.ac.uk xwang3@cs.umass.edu
Pseudocode	Yes	Algorithm 1 Memory efficient estimate of low-rank B such that BBT PT t=1 btb T t .", "Algorithm 2 Optimize Laplace prior precision using the training set model evidence", "Algorithm 3 Optimize Laplace prior precision using validation log-likelihood
Open Source Code	Yes	We open sourced an original implementation based on Laplace Redux (Daxberger et al., 2021a) and ASDL (Osawa et al., 2023) at https://github.com/adamxyang/laplace-lora, and a newer standalone implementation which we intended to support going forward at https: //github.com/Maxime Robeyns/bayesian_lora.
Open Datasets	Yes	We began our evaluation with in-distribution fine-tuning on the following common sense reasoning tasks: Winogrande-small (WG-S), Winogrande-medium (WG-M) (Sakaguchi et al., 2021), ARC-Challenge (ARC-C), ARC-Easy (ARC-E) (Clark et al., 2018), openbook QA (OBQA) (Mihaylov et al., 2018), and Bool Q (Clark et al., 2019) benchmarks.
Dataset Splits	Yes	Next, we considered a more standard setting, in which the training set was partitioned into an 80% subset for training and a 20% validation subset.
Hardware Specification	No	The paper mentions 'computational facilities of the Advanced Computing Research Centre, University of Bristol' and discusses memory in terms of GB, but it does not specify exact hardware components such as specific GPU models or CPU processors used for the experiments.
Software Dependencies	No	The paper mentions using 'PEFT library (Mangrulkar et al., 2022)', 'Hugging Face (Wolf et al., 2020)', 'Laplace Redux (Daxberger et al., 2021a)', and 'ASDL (Osawa et al., 2023)', but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The fine-tuning was carried out with a batch size of 4 for 10,000 iterations. For True/False or multiple choice questions we selected the next token logits corresponding to True/False or A/B/C/D depending on each dataset (refer to Appendix B for our prompt templates), and fine-tuned the LLM to maximize the likelihood of the correct token. Table 7 details: Lo RA r 8, Lo RA α 16, Dropout Probability 0.1, Weight Decay 0, Learning Rate 5e-5, Learning Rate Scheduler Linear, Batch Size 4, Max Sequence Length 300.