reproducibilityindex.ai

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

Authors: Haotian Sun, Yuchen Zhuang, Wei Wei, Chao Zhang, Bo Dai

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate BBOX-ADAPTER s effectiveness and cost efficiency. It improves model performance by up to 6.77% across diverse tasks and domains, while reducing training and inference costs by 31.30x and 1.84x, respectively.
Researcher Affiliation	Collaboration	1Georgia Tech 2Accenture. Correspondence to: Haotian Sun <haotian.sun@gatech.edu>, Bo Dai <bodai@cc.gatech.edu>.
Pseudocode	Yes	Algorithm 1 Overview of BBOX-ADAPTER.
Open Source Code	Yes	The implementation of BBOX-ADAPTER is available on Git Hub2. 2https://github.com/haotiansun14/BBox-Adapter
Open Datasets	Yes	We evaluate BBOX-ADAPTER on four distinct question-answering tasks, requiring model adaptation on mathematical (GSM8K (Cobbe et al., 2021)), implicit-reasoning (Strategy QA (Geva et al., 2021)), truthful (Truthful QA (Lin et al., 2022)), and scientific (Science QA (Lu et al., 2022)) domains.
Dataset Splits	Yes	GSM8K (Cobbe et al., 2021) is a dataset... The dataset contains 7473 training samples and 1319 test samples. Strategy QA (Geva et al., 2021) is a question-answering benchmark... including 2059 training samples and 229 test samples. Truthful QA (Lin et al., 2022)... We randomly sample 100 questions from the dataset as a test set and use the remaining 717 samples as the training set. Science QA (Lu et al., 2022)... We excluded questions requiring image input and randomly selected 2,000 questions for training and 500 for testing...
Hardware Specification	Yes	All experiments are conducted on CPU: AMD(R) EPYC(R) 7702 64-Core Processor @ 1.50GHz and GPU: NVIDIA A100-SXM4-80GB using Python 3.10.13.
Software Dependencies	No	The paper mentions 'Python 3.10.13' and 'toolkit packages of peft and transformers from Hugging Face' but does not specify version numbers for the key deep learning libraries 'peft' and 'transformers'.
Experiment Setup	Yes	For the supervised fine-tuning baseline... Lo RA Dropout 0.1, # Epochs 3, Learning Rate 2e-4, Weight Decay 0.001, Batch Size / GPU 8, Max Gradient Norm 0.3, Optimizer Paged Adam W 32bit, LR Scheduler Cosine. Regarding the BBOX-ADAPTER, we set the maximum length for a generated solution as 512 and the temperature as 1.0... We set the learning rate η as 5e 6, the batch size as 64, and the number of training steps as 6, 000... We employed Adam W optimizer with a weight decay of 0.01.