reproducibilityindex.ai

Stay on Topic with Classifier-Free Guidance

Authors: Guillaume Sanchez, Alexander Spangher, Honglu Fan, Elad Levi, Stella Biderman

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate across a wide array of benchmarks that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLa MAfamily models across a broad set of Q&A, reasoning and code generation tasks, achieving SOTA on LAMBADA with LLa MA-7B over Pa LM540B;
Researcher Affiliation	Collaboration	1Light On, France (work done while working at Hexaglobe) 2Eleuther AI 3Information Sciences Institute, University of Southern California 4University of Geneva 5Sightful.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our work has been directly incorporated into leading open-source libraries: Huggingface and llama.cpp.
Open Datasets	Yes	We run these models on a sample dataset of 32, 902 data-points from P3 (Sanh et al., 2021).
Dataset Splits	No	The paper does not explicitly state specific training, validation, or test dataset splits (percentages or counts) for the datasets used.
Hardware Specification	No	The paper mentions "He took care of running the experiments of Section 3.1 thanks to his access to Core Weave and Stability s computing cluster." but does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies	No	The paper mentions using "Eleuther AI s Language Model Evaluation Harness (Gao et al., 2021)" but does not specify its version number or any other software dependencies with version information.
Experiment Setup	Yes	We test different CFG strengths3 and different temperatures, evaluating at pass@k for k = 1, 10, 100 4. We show the results for temperature= 0.2 in Table 25. Footnote 3: γ = 1.0, 1.1, 1.25, 1.5, 1.75, 2.0