Diffusion-LM Improves Controllable Text Generation

Authors: Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S. Liang, Tatsunori B. Hashimoto

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.We train Diffusion-LM for two language modeling tasks. We then apply the controllable generation method to 5 classifier-guided control tasks, and apply MBR decoding to a classifier-free control task (i.e. infilling). We measure the impact of our proposed design choices through lm-score.
Researcher Affiliation Academia Xiang Lisa Li Stanford University xlisali@stanford.edu John Thickstun Stanford University jthickst@stanford.edu Ishaan Gulrajani Stanford Univeristy igul@stanford.edu Percy Liang Stanford Univeristy pliang@cs.stanford.edu Tatsunori B. Hashimoto Stanford Univeristy thashim@stanford.edu
Pseudocode No The paper describes the methods and processes in narrative text and with diagrams, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Xiang Li1999/Diffusion-LM.git
Open Datasets Yes We train Diffusion-LM on two datasets: E2E [34] and ROCStories [32].
Dataset Splits Yes For each control task (e.g. semantic content), we sample 200 control targets c (e.g., rating=5 star) from the validation splits, and we generate 50 samples for each control target.
Hardware Specification No The paper mentions the model architecture and parameter count ('80M parameters') but does not provide any specific details regarding the hardware (e.g., CPU/GPU models, memory) used for experiments.
Software Dependencies No The paper mentions models like Transformer and GPT-2, and optimizers like Adagrad, but it does not specify any software libraries or dependencies with version numbers (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x').
Experiment Setup Yes Our Diffusion-LM is based on Transformer [52] architecture with 80M parameters, with a sequence length n = 64, diffusion steps T = 2000 and a square-root noise schedule (see Appendix A for details). We treat the embedding dimension as a hyperparameter, setting d = 16 for E2E and d = 128 for ROCStories. See Appendix B for hyperparameter details.