MIROSTAT: A NEURAL TEXT DECODING ALGORITHM THAT DIRECTLY CONTROLS PERPLEXITY
Authors: Sourya Basu, Govardana Sachitanandam Ramachandran, Nitish Shirish Keskar, Lav R. Varshney
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that for low values of k and p, perplexity drops significantly with generated text length and leads to excessive repetitions (the boredom trap). Contrarily, for large values of k and p, perplexity increases with generated text length and leads to incoherence (confusion trap). Mirostat avoids both traps. Specifically, we show that setting target perplexity value beyond a threshold yields negligible sentence-level repetitions. Experiments with human raters for fluency, coherence, and quality further verify our findings. |
| Researcher Affiliation | Collaboration | Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign Salesforce Research |
| Pseudocode | Yes | Algorithm 1: Mirostat sampling for perplexity control |
| Open Source Code | Yes | 2Code is available at https://github.com/basusourya/mirostat |
| Open Datasets | Yes | We use the GPT-2 LM with 117M parameters for all experiments (Radford et al., 2019) unless mentioned otherwise, and just refer to it as GPT-2. |
| Dataset Splits | No | The paper does not specify train/validation/test splits for the data used in their experiments, as they are primarily evaluating a text decoding algorithm on a pre-trained language model rather than training a new model. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using the GPT-2 LM but does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Alg. 1 details mirostat, which generates texts with predetermined average surprise value. The input is a target surprise value τ, which in turn initializes a variable µ = 2τ. Each word is sampled by first estimating s from (30) as ˆs, then using top-k sampling where k is a function of the estimated s and of the target surprise value of the output text. [...] Compute error: e = S(X) τ Update µ: µ = µ ηe. Also, for human evaluations: 'We generated 300 tokens using GPT-2 from a fixed context with average cross-entropy rate τ {2.5, 3, 4, 5} using both mirostat and top-p sampling.' |