Black-Box Tuning for Language-Model-as-a-Service
Authors: Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that the black-box tuning with Ro BERTa on a few labeled samples not only significantly outperforms manual prompt and GPT3 s in-context learning, but also surpasses the gradient-based counterparts, i.e., prompt tuning and full model tuning. |
| Researcher Affiliation | Academia | 1Fudan University 2East China Normal University 3Peng Cheng Laboratory. |
| Pseudocode | No | The paper describes the approach and optimization process but does not include a formal pseudocode block or an algorithm section. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/txsun1997/Black-Box-Tuning |
| Open Datasets | Yes | Dataset. We conduct experiments on several common language understanding tasks including sentiment analysis, topic classification, natural language inference (NLI), and paraphrase. For sentiment analysis, we choose SST2 (Socher et al., 2013) and Yelp polarity (Zhang et al., 2015a). For topic classification, we choose AG s News and DBPedia (Zhang et al., 2015a). For NLI, we choose SNLI (Bowman et al., 2015) and RTE (Wang et al., 2019). For paraphrase, we choose MRPC (Dolan & Brockett, 2005). |
| Dataset Splits | Yes | We randomly select k samples for each class to construct a k-shot training set Dtrain, and compose a development set Ddev by randomly drawing another k samples from the original training set and ensure that |Dtrain| = |Ddev| to simulate the true few-shot learning setting (Perez et al., 2021). |
| Hardware Specification | Yes | All the methods are implemented with Py Torch (Paszke et al., 2019) and experimented on a single NVIDIA GTX 3090 GPU. |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al., 2019)" and "ONNX Runtime" but does not specify version numbers for these software components. It does not list any other software dependencies with version numbers. |
| Experiment Setup | Yes | For black-box tuning, we give in Table 2 the default configuration of hyper-parameters used in our experiments. The effect of each hyper-parameter is explored in 4.3. Table 2 includes: Prompt length (L) 50, Subspace dimension (d) 500, Population size (λ) 20, Random projection (A) Uniform, Loss function L Cross Entropy, Budget (# of API calls) 8000. Additionally, for Prompt Tuning: "Adam optimizer (Kingma & Ba, 2015) with learning rate of 5e-4 and batch size of 16 for 1000 epochs." |