reproducibilityindex.ai

Closing the Curious Case of Neural Text Degeneration

Authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a pilot investigation ( 5) to empirically evaluate this basis-aware truncation sampling approach. Our results shows improvements on an open-ended generation task via both automatic and human evaluation metrics under low-entropy generation (i.e., close to greedy).
Researcher Affiliation	Collaboration	Matthew Finlayson University of Southern California John Hewitt Stanford University Alexander Koller Saarland University Swabha Swayamdipta University of Southern California Ashish Sabharwal The Allen Institute for AI
Pseudocode	Yes	Algorithm 1 gives the procedure for BAT sampling. Algorithm 1 BAT sampling
Open Source Code	Yes	Code for experiments: https://github.com/mattf1n/basis-aware-threshold.
Open Datasets	Yes	We generate completions for 5000 35-token prefixes taken from the Open Web Text (OWT) (Gokaslan et al., 2019).
Dataset Splits	Yes	We perform a parameter sweep for nucleus, η, and ϵ sampling and select the parameter that gives the highest MAUVE score on the OWT validation set (see Table 3 in the appendix).
Hardware Specification	No	The paper mentions running experiments with 'GPT-2' and notes that 'No open-source solver we tried was able to solve a single problem in a reasonable amount of time... Proprietary solvers do better in some cases, but only the MOSEK solver (Ap S, 2023) was able to solve the full problem in under 1 minute.' However, it does not specify any particular CPU, GPU, or TPU models, or other specific hardware configurations used for running the experiments.
Software Dependencies	Yes	Proprietary solvers do better in some cases, but only the MOSEK solver (Ap S, 2023) was able to solve the full problem in under 1 minute. ... MOSEK Ap S. MOSEK Optimizer API for Python 9.3.22. Version 10.0., 2023.
Experiment Setup	Yes	We perform a parameter sweep for nucleus, η, and ϵ sampling and select the parameter that gives the highest MAUVE score on the OWT validation set (see Table 3 in the appendix). We control for the parameter choice in comparisons between BAT methods and their vanilla counterparts, by matching the parameters by selecting the BAT parameter that rejects the same proportion of tokens from corpus of human text as the vanilla method; see Appendix F for more details. ... We expose δ as a parameter to tune the restrictiveness of the sampling method.