Stay on Topic with Classifier-Free Guidance

Authors: Guillaume Sanchez, Alexander Spangher, Honglu Fan, Elad Levi, Stella Biderman

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate across a wide array of benchmarks that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLa MAfamily models across a broad set of Q&A, reasoning and code generation tasks, achieving SOTA on LAMBADA with LLa MA-7B over Pa LM540B;
Researcher Affiliation Collaboration 1Light On, France (work done while working at Hexaglobe) 2Eleuther AI 3Information Sciences Institute, University of Southern California 4University of Geneva 5Sightful.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our work has been directly incorporated into leading open-source libraries: Huggingface and llama.cpp.
Open Datasets Yes We run these models on a sample dataset of 32, 902 data-points from P3 (Sanh et al., 2021).
Dataset Splits No The paper does not explicitly state specific training, validation, or test dataset splits (percentages or counts) for the datasets used.
Hardware Specification No The paper mentions "He took care of running the experiments of Section 3.1 thanks to his access to Core Weave and Stability s computing cluster." but does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies No The paper mentions using "Eleuther AI s Language Model Evaluation Harness (Gao et al., 2021)" but does not specify its version number or any other software dependencies with version information.
Experiment Setup Yes We test different CFG strengths3 and different temperatures, evaluating at pass@k for k = 1, 10, 100 4. We show the results for temperature= 0.2 in Table 25. Footnote 3: γ = 1.0, 1.1, 1.25, 1.5, 1.75, 2.0