Principled Gradient-Based MCMC for Conditional Sampling of Text

Authors: Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Holden Lee, Jason Eisner, Ryan Cotterell

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on various forms of text generation, we demonstrate that our unbiased samplers are able to generate more fluent text while better adhering to the control objectives.
Researcher Affiliation Academia 1Johns Hopkins University 2ETH Z urich 3MIT CSAIL 4University of Southern California.
Pseudocode No The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not provide any statements about code release or links to a source code repository.
Open Datasets Yes For topic controlled task...E2E dataset (Novikova et al., 2017). For sentiment controlled task...SST2 dataset of movie reviews (Socher et al., 2013). For position constrained task...COLLIE (Yao et al., 2024).
Dataset Splits No The paper mentions evaluating classifiers on a "test set" but does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts) for the primary datasets used in the MCMC experiments or for training the language models/classifiers from scratch.
Hardware Specification No The experiments in this work were carried out at the Advanced Research Computing at Hopkins (ARCH) core facility, which is supported by the National Science Foundation (NSF) grant number OAC 1920103.
Software Dependencies No The paper mentions using "GPT-2 checkpoint from the Huggingface library" but does not specify version numbers for Python, PyTorch, or other key software dependencies.
Experiment Setup Yes All step sizes are tuned with grid search with a grid resolution of 0.1. For the Toy Example, the inverse temperature β = 0.42 and the sequence length (i.e., the number of spins in the Ising model) is N = 5. The step size for MUCOLA is 1.5, the trajectory length of SVS is 2π, and the step size of p-NCG and Gw L are both 1.0.