InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
Authors: Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate INSTRUCTZERO on different combinations of open-source LLMs and APIs including Vicuna and Chat GPT. INSTRUCTZERO outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code is available: https://github.com/ Lichang-Chen/Instruct Zero. Extensive experiments demonstrate that our method could effectively generate instructions that enhance task performance while achieving predictions on par with or even superior to those created by previous methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park. Correspondence to: Lichang Chen <bobchen@cs.umd.edu>, Jiuhai Chen <jchen169@umd.edu>. |
| Pseudocode | Yes | The complete procedure is provided in Algorithm 1. |
| Open Source Code | Yes | Our code is available: https://github.com/ Lichang-Chen/Instruct Zero. |
| Open Datasets | Yes | We assess the effectiveness of zero-shot in-context learning on instruction tasks proposed in (Honovich et al., 2022), including all 24 tasks used in previous auto-instruction work (Zhou et al., 2022). We further add 8 extra tasks to enrich the benchmark for evaluating all methods in more comprehensive scenarios spanning many facets of language understanding. We provide detailed descriptions of each task in the Appendix. Training-set examples can be used for instruction optimization but the final instruction p is evaluated on a held-out test set. Zero-shot performance H(p) on the test set is reported. |
| Dataset Splits | Yes | For each task, we draw τ = 5 and 20 samples from the training set as the exemplars and validation set Dt, respectively. |
| Hardware Specification | Yes | All training and tests are conducted on a NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions several LLMs and APIs used (e.g., Vicuna, Chat GPT, GPT-3.5-turbo, LLa MA, Stanford Alpaca, GPT-4, Claude, Pa LM-2). However, it does not specify explicit version numbers for these or any other software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For each task, we draw τ = 5 and 20 samples from the training set as the exemplars and validation set Dt, respectively. For the number of tokens in soft prompts, we search for the best value among {3, 5, 10} based on the validation set performance. We draw entries of the random projection matrix A from a uniform distribution between [ 1, 1]. The dimensionality d of p is set to 10. In experiments, we apply a mini-batch version of INSTRUCTZERO that explores 25 soft prompts in every iteration. |