COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics

Authors: Lianhui Qin, Sean Welleck, Daniel Khashabi, Yejin Choi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on these constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation.
Researcher Affiliation Collaboration 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Allen Institute for Artificial Intelligence 3Department of Computer Science, Johns Hopkins University
Pseudocode Yes Algorithm 1 Constrained Decoding w/ Langevin Dynamics.
Open Source Code Yes Code is available at https://github.com/qkaren/COLD_decoding
Open Datasets Yes We use the benchmark dataset TIMETRAVEL [44]. We use the set of constraint words from the COMMONGEN corpus [29]
Dataset Splits No We select the constraint weights on the dev set. However, specific percentages or counts for training/validation/test splits are not provided in the main text.
Hardware Specification Yes The table below shows the results (on an NVIDIA Quadro GV100 GPU, batch size=32).
Software Dependencies No The paper mentions using "GPT2-XL [46]" as the base language model, but does not specify version numbers for any software libraries or dependencies (e.g., PyTorch version, specific deep learning framework versions).
Experiment Setup Yes Throughout the experiments, we set the number of Langevin dynamics steps to N = 2000, with a step size = 0.1 (Eq. 2). We gradually decrease σ(n) across iterations, which intuitively transitions the decoding procedure from exploration to optimization. In our experiments, we typically used the schedule which sets/reduces σ to {1, 0.5, 0.1, 0.05, 0.01} at iterations {0, 50, 500, 1000, 1500}, respectively.