reproducibilityindex.ai

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Authors: Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present our finding that prepending a Task Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively.
Researcher Affiliation	Collaboration	Seonghyeon Ye1, Hyeonbin Hwang1, Sohee Yang1,2, Hyeongu Yun3, Yireun Kim3, Minjoon Seo1 1KAIST 2UCL 3LG AI Research
Pseudocode	No	The paper describes the rules for TAPP construction in paragraph form and bullet points, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets	Yes	We construct the demonstrations for TAPP by utilizing English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark (Wang et al. 2022c) as the task pool, which includes 756 tasks in total.
Dataset Splits	Yes	We construct the demonstrations for TAPP by utilizing English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark (Wang et al. 2022c) as the task pool... To evaluate the effectiveness of TAPP, we use the held-out tasks from SUPERNI for testing
Hardware Specification	No	The paper makes no mention of specific hardware used for the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We select K tasks as demonstrations for TAPP... Unless specified, we set K = 8 as default... Because we mainly experiment on 175B-sized GPT-3, we set the default maximum input sequence as 2048.