Learning to Compress Prompts with Gist Tokens

Authors: Jesse Mu, Xiang Li, Noah Goodman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On decoder (LLa MA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts, resulting in up to 40% FLOPs reductions, 4.2% wall time speedups, and storage savings, all with minimal loss in output quality.
Researcher Affiliation Academia Jesse Mu, Xiang Lisa Li, Noah Goodman Stanford University muj@cs.stanford.edu, {xlisali,ngoodman}@stanford.edu
Pseudocode Yes A Example Py Torch Implementation of Gist Masking
Open Source Code Yes Code, data, and model checkpoints are available at https://github.com/jayelm/gisting.
Open Datasets Yes To obtain the largest possible set of tasks for instruction finetuning, we create a dataset called Alpaca+, which combines the Self-Instruct [36] and Stanford Alpaca [31] instruction tuning datasets, each consisting of (t, x, y) tuples sampled from Open AI s text-davinci-001 and text-davinci-003 variants of GPT-3, respectively.
Dataset Splits Yes From Alpaca+ we hold out 3 validation splits: 1000 Seen prompts (with unseen, non-empty inputs); 1000 Unseen prompts (with non-empty inputs); and the 252 hand-written Human prompts and completions used in Wang et al. [36], of which 83% have non-empty inputs.
Hardware Specification Yes Experiments were run on a cluster machine with 4x A100-SXM4-80GB NVIDIA GPUs, 480GB RAM, and 16 CPUs, using Py Torch 2.0 [24], Hugging Face Transformers [41], and Deep Speed [29].
Software Dependencies Yes Experiments were run on a cluster machine with 4x A100-SXM4-80GB NVIDIA GPUs, 480GB RAM, and 16 CPUs, using Py Torch 2.0 [24], Hugging Face Transformers [41], and Deep Speed [29].
Experiment Setup Yes Full hyperparameters for training runs are located in Table A.1.