Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing
Authors: Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou, Jian Jiao, Denis X Charles
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Evoke significantly outperforms existing methods. |
| Researcher Affiliation | Collaboration | Microsoft University of Washington University of Michigan |
| Pseudocode | Yes | Algorithm 1: Evoke |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We perform a comprehensive evaluation on eight tasks from Instruction Induction (Honovich et al., 2022) and Big Bench Instruction Induction (BBII) (Zhou et al., 2022), including orthography starts with, common concept, rhymes, movie recommendation, logical fallacy detection, presuppositions as nli, winowhy, epistemic reasoning. |
| Dataset Splits | No | The paper states: 'For each task, we divide the dataset randomly into two sets, 60% of the data is allocated for training (prompt refinement) and the remaining 40% is for testing (prompt evaluation).' It does not explicitly mention a validation set or split. |
| Hardware Specification | No | The paper states: 'In all experiments, we utilize the Azure Open AI API service (GPT-4) for the involved LLMs.' It does not specify the underlying hardware specifications (e.g., specific GPU models, CPUs) used for running the experiments beyond this API usage. |
| Software Dependencies | No | The paper mentions using 'Azure Open AI API service (GPT-4)' but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | The paper describes the workflow of Evoke, including the roles of LLM-Author, LLM-Reviewer, and LLM-Selector, their prompts, and the iterative refinement process. For example, 'The workflow comprises three steps: First, the LLM-Author edits prompts from previous iterations, taking into account the past edits and the feedback from the LLM-Reviewer. Second, the LLM-Reviewer scores the revised prompts from the LLM-Author, and the top-n candidates with the highest scores are selected for subsequent procedures. ... Details of the algorithm can be found in Algorithm 1.' |