BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

Authors: Haotian Sun, Yuchen Zhuang, Wei Wei, Chao Zhang, Bo Dai

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate BBOX-ADAPTER s effectiveness and cost efficiency. It improves model performance by up to 6.77% across diverse tasks and domains, while reducing training and inference costs by 31.30x and 1.84x, respectively.
Researcher Affiliation Collaboration 1Georgia Tech 2Accenture. Correspondence to: Haotian Sun <haotian.sun@gatech.edu>, Bo Dai <bodai@cc.gatech.edu>.
Pseudocode Yes Algorithm 1 Overview of BBOX-ADAPTER.
Open Source Code Yes The implementation of BBOX-ADAPTER is available on Git Hub2. 2https://github.com/haotiansun14/BBox-Adapter
Open Datasets Yes We evaluate BBOX-ADAPTER on four distinct question-answering tasks, requiring model adaptation on mathematical (GSM8K (Cobbe et al., 2021)), implicit-reasoning (Strategy QA (Geva et al., 2021)), truthful (Truthful QA (Lin et al., 2022)), and scientific (Science QA (Lu et al., 2022)) domains.
Dataset Splits Yes GSM8K (Cobbe et al., 2021) is a dataset... The dataset contains 7473 training samples and 1319 test samples. Strategy QA (Geva et al., 2021) is a question-answering benchmark... including 2059 training samples and 229 test samples. Truthful QA (Lin et al., 2022)... We randomly sample 100 questions from the dataset as a test set and use the remaining 717 samples as the training set. Science QA (Lu et al., 2022)... We excluded questions requiring image input and randomly selected 2,000 questions for training and 500 for testing...
Hardware Specification Yes All experiments are conducted on CPU: AMD(R) EPYC(R) 7702 64-Core Processor @ 1.50GHz and GPU: NVIDIA A100-SXM4-80GB using Python 3.10.13.
Software Dependencies No The paper mentions 'Python 3.10.13' and 'toolkit packages of peft and transformers from Hugging Face' but does not specify version numbers for the key deep learning libraries 'peft' and 'transformers'.
Experiment Setup Yes For the supervised fine-tuning baseline... Lo RA Dropout 0.1, # Epochs 3, Learning Rate 2e-4, Weight Decay 0.001, Batch Size / GPU 8, Max Gradient Norm 0.3, Optimizer Paged Adam W 32bit, LR Scheduler Cosine. Regarding the BBOX-ADAPTER, we set the maximum length for a generated solution as 512 and the temperature as 1.0... We set the learning rate η as 5e 6, the batch size as 64, and the number of training steps as 6, 000... We employed Adam W optimizer with a weight decay of 0.01.