reproducibilityindex.ai

Guiding Large Language Models via Directional Stimulus Prompting

Authors: Zekun Li, Baolin Peng, Pengcheng He, Michel Galley, Jianfeng Gao, Xifeng Yan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments indicate a consistent improvement in the performance of LLMs such as Chat GPT, Codex, and Instruct GPT on these supervised tasks with minimal labeled data.
Researcher Affiliation	Collaboration	University of California, Santa Barbara1 Microsoft2 {zekunli, xyan}@cs.ucsb.edu {bapeng,penhe,mgalley,jfgao}@microsoft.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and data are publicly available.3 https://github.com/Leezekun/Directional-Stimulus-Prompting
Open Datasets	Yes	We conduct our experiments on the CNN/Daily Mail dataset, a widelyused news summarization benchmark. ... We conduct experiments on the popular task-oriented dialogue dataset Multi WOZ [7], including both the Multi WOZ2.0 (the original version) and Multi WOZ2.1 version [15].
Dataset Splits	Yes	This dataset contains 287,113 training examples, 13,368 validation examples, and 11,490 test examples. To keep the API usage cost low, we use a subset of 1,000, 2,000, and 4,000 for training, 500 for validation, and 500 for testing.
Hardware Specification	Yes	All the experiments are run on a server equipped with 8 NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions software like T5, Flan-T5, Chat GPT, Codex, Instruct GPT, and the spacy package, but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup	Yes	The hyperparameters used in our experiments are detailed in Table 3. ... Supervised fine-tuning (SFT) batch size: 8 epochs: 5 learning rate: 0.00002 ... RL (NLPO) steps per update: 5120 total number of steps: 51200 batch size: 8 epochs per update: 5 learning rate: 0.000002...