Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Authors: Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present our finding that prepending a Task Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively.
Researcher Affiliation Collaboration Seonghyeon Ye1, Hyeonbin Hwang1, Sohee Yang1,2, Hyeongu Yun3, Yireun Kim3, Minjoon Seo1 1KAIST 2UCL 3LG AI Research
Pseudocode No The paper describes the rules for TAPP construction in paragraph form and bullet points, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets Yes We construct the demonstrations for TAPP by utilizing English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark (Wang et al. 2022c) as the task pool, which includes 756 tasks in total.
Dataset Splits Yes We construct the demonstrations for TAPP by utilizing English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark (Wang et al. 2022c) as the task pool... To evaluate the effectiveness of TAPP, we use the held-out tasks from SUPERNI for testing
Hardware Specification No The paper makes no mention of specific hardware used for the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We select K tasks as demonstrations for TAPP... Unless specified, we set K = 8 as default... Because we mainly experiment on 175B-sized GPT-3, we set the default maximum input sequence as 2048.