Guiding Large Language Models via Directional Stimulus Prompting

Authors: Zekun Li, Baolin Peng, Pengcheng He, Michel Galley, Jianfeng Gao, Xifeng Yan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate a consistent improvement in the performance of LLMs such as Chat GPT, Codex, and Instruct GPT on these supervised tasks with minimal labeled data.
Researcher Affiliation Collaboration University of California, Santa Barbara1 Microsoft2 {zekunli, xyan}@cs.ucsb.edu {bapeng,penhe,mgalley,jfgao}@microsoft.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code and data are publicly available.3 https://github.com/Leezekun/Directional-Stimulus-Prompting
Open Datasets Yes We conduct our experiments on the CNN/Daily Mail dataset, a widelyused news summarization benchmark. ... We conduct experiments on the popular task-oriented dialogue dataset Multi WOZ [7], including both the Multi WOZ2.0 (the original version) and Multi WOZ2.1 version [15].
Dataset Splits Yes This dataset contains 287,113 training examples, 13,368 validation examples, and 11,490 test examples. To keep the API usage cost low, we use a subset of 1,000, 2,000, and 4,000 for training, 500 for validation, and 500 for testing.
Hardware Specification Yes All the experiments are run on a server equipped with 8 NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions software like T5, Flan-T5, Chat GPT, Codex, Instruct GPT, and the spacy package, but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup Yes The hyperparameters used in our experiments are detailed in Table 3. ... Supervised fine-tuning (SFT) batch size: 8 epochs: 5 learning rate: 0.00002 ... RL (NLPO) steps per update: 5120 total number of steps: 51200 batch size: 8 epochs per update: 5 learning rate: 0.000002...