COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
Authors: Lianhui Qin, Sean Welleck, Daniel Khashabi, Yejin Choi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on these constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation. |
| Researcher Affiliation | Collaboration | 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Allen Institute for Artificial Intelligence 3Department of Computer Science, Johns Hopkins University |
| Pseudocode | Yes | Algorithm 1 Constrained Decoding w/ Langevin Dynamics. |
| Open Source Code | Yes | Code is available at https://github.com/qkaren/COLD_decoding |
| Open Datasets | Yes | We use the benchmark dataset TIMETRAVEL [44]. We use the set of constraint words from the COMMONGEN corpus [29] |
| Dataset Splits | No | We select the constraint weights on the dev set. However, specific percentages or counts for training/validation/test splits are not provided in the main text. |
| Hardware Specification | Yes | The table below shows the results (on an NVIDIA Quadro GV100 GPU, batch size=32). |
| Software Dependencies | No | The paper mentions using "GPT2-XL [46]" as the base language model, but does not specify version numbers for any software libraries or dependencies (e.g., PyTorch version, specific deep learning framework versions). |
| Experiment Setup | Yes | Throughout the experiments, we set the number of Langevin dynamics steps to N = 2000, with a step size = 0.1 (Eq. 2). We gradually decrease σ(n) across iterations, which intuitively transitions the decoding procedure from exploration to optimization. In our experiments, we typically used the schedule which sets/reduces σ to {1, 0.5, 0.1, 0.05, 0.01} at iterations {0, 50, 500, 1000, 1500}, respectively. |