Learning to Compress Prompts with Gist Tokens
Authors: Jesse Mu, Xiang Li, Noah Goodman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On decoder (LLa MA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts, resulting in up to 40% FLOPs reductions, 4.2% wall time speedups, and storage savings, all with minimal loss in output quality. |
| Researcher Affiliation | Academia | Jesse Mu, Xiang Lisa Li, Noah Goodman Stanford University muj@cs.stanford.edu, {xlisali,ngoodman}@stanford.edu |
| Pseudocode | Yes | A Example Py Torch Implementation of Gist Masking |
| Open Source Code | Yes | Code, data, and model checkpoints are available at https://github.com/jayelm/gisting. |
| Open Datasets | Yes | To obtain the largest possible set of tasks for instruction finetuning, we create a dataset called Alpaca+, which combines the Self-Instruct [36] and Stanford Alpaca [31] instruction tuning datasets, each consisting of (t, x, y) tuples sampled from Open AI s text-davinci-001 and text-davinci-003 variants of GPT-3, respectively. |
| Dataset Splits | Yes | From Alpaca+ we hold out 3 validation splits: 1000 Seen prompts (with unseen, non-empty inputs); 1000 Unseen prompts (with non-empty inputs); and the 252 hand-written Human prompts and completions used in Wang et al. [36], of which 83% have non-empty inputs. |
| Hardware Specification | Yes | Experiments were run on a cluster machine with 4x A100-SXM4-80GB NVIDIA GPUs, 480GB RAM, and 16 CPUs, using Py Torch 2.0 [24], Hugging Face Transformers [41], and Deep Speed [29]. |
| Software Dependencies | Yes | Experiments were run on a cluster machine with 4x A100-SXM4-80GB NVIDIA GPUs, 480GB RAM, and 16 CPUs, using Py Torch 2.0 [24], Hugging Face Transformers [41], and Deep Speed [29]. |
| Experiment Setup | Yes | Full hyperparameters for training runs are located in Table A.1. |