Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Authors: Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present our finding that prepending a Task Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. |
| Researcher Affiliation | Collaboration | Seonghyeon Ye1, Hyeonbin Hwang1, Sohee Yang1,2, Hyeongu Yun3, Yireun Kim3, Minjoon Seo1 1KAIST 2UCL 3LG AI Research |
| Pseudocode | No | The paper describes the rules for TAPP construction in paragraph form and bullet points, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for their methodology is open-source or publicly available. |
| Open Datasets | Yes | We construct the demonstrations for TAPP by utilizing English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark (Wang et al. 2022c) as the task pool, which includes 756 tasks in total. |
| Dataset Splits | Yes | We construct the demonstrations for TAPP by utilizing English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark (Wang et al. 2022c) as the task pool... To evaluate the effectiveness of TAPP, we use the held-out tasks from SUPERNI for testing |
| Hardware Specification | No | The paper makes no mention of specific hardware used for the experiments, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We select K tasks as demonstrations for TAPP... Unless specified, we set K = 8 as default... Because we mainly experiment on 175B-sized GPT-3, we set the default maximum input sequence as 2048. |