Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Authors: Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We optimize prompts for both closedand open-source LLMs including GPT-3.5 and Alpaca, on 31 datasets covering language understanding, generation tasks, as well as BIG-Bench Hard (BBH) tasks. EVOPROMPT significantly outperforms human-engineered prompts and existing methods for automatic prompt generation (e.g., up to 25% on BBH). |
| Researcher Affiliation | Collaboration | 1Tsinghua University 2Microsoft Research 3Northeastern University |
| Pseudocode | Yes | Algorithm 1 Discrete prompt optimization: EVOPROMPT |
| Open Source Code | Yes | Our code is available at https://github.com/beeevita/Evo Prompt. |
| Open Datasets | Yes | We first conduct experiments on language understanding tasks across 7 datasets to validate our methods, including sentiment classification (SST-2 (Socher et al., 2013), MR (PANG, 2005), CR (Hu & Liu, 2004), SST-5 (Socher et al., 2013)), topic classification (AG s News (Zhang et al., 2015), TREC (Voorhees & Tice, 2000)) and subjectivity classification (Subj (Pang & Lee, 2004)). For summarization, we adopt SAMSum (Gliwa et al., 2019)... for text simplification... we employ the ASSET dataset (Alva-Manchego et al., 2020)... we apply BBH (Suzgun et al., 2022). |
| Dataset Splits | Yes | Specifically, abstaining from any gradients or parameters, EVOPROMPT starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU models, CPU types, or cloud instance specifications) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software dependencies (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | The parameters for the experiments are shown in Table 11. For evolutionary algorithms implemented by GPT-3.5... we use Top-p decoding (temperature=0.5, P = 0.95). For the task implementation, we use greedy decoding and the default temperature for Alpaca. For the generation tasks implemented by GPT-3.5, the temperature is 0.0. |