Diffusion-LM Improves Controllable Text Generation
Authors: Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S. Liang, Tatsunori B. Hashimoto
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.We train Diffusion-LM for two language modeling tasks. We then apply the controllable generation method to 5 classifier-guided control tasks, and apply MBR decoding to a classifier-free control task (i.e. infilling). We measure the impact of our proposed design choices through lm-score. |
| Researcher Affiliation | Academia | Xiang Lisa Li Stanford University xlisali@stanford.edu John Thickstun Stanford University jthickst@stanford.edu Ishaan Gulrajani Stanford Univeristy igul@stanford.edu Percy Liang Stanford Univeristy pliang@cs.stanford.edu Tatsunori B. Hashimoto Stanford Univeristy thashim@stanford.edu |
| Pseudocode | No | The paper describes the methods and processes in narrative text and with diagrams, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Xiang Li1999/Diffusion-LM.git |
| Open Datasets | Yes | We train Diffusion-LM on two datasets: E2E [34] and ROCStories [32]. |
| Dataset Splits | Yes | For each control task (e.g. semantic content), we sample 200 control targets c (e.g., rating=5 star) from the validation splits, and we generate 50 samples for each control target. |
| Hardware Specification | No | The paper mentions the model architecture and parameter count ('80M parameters') but does not provide any specific details regarding the hardware (e.g., CPU/GPU models, memory) used for experiments. |
| Software Dependencies | No | The paper mentions models like Transformer and GPT-2, and optimizers like Adagrad, but it does not specify any software libraries or dependencies with version numbers (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x'). |
| Experiment Setup | Yes | Our Diffusion-LM is based on Transformer [52] architecture with 80M parameters, with a sequence length n = 64, diffusion steps T = 2000 and a square-root noise schedule (see Appendix A for details). We treat the embedding dimension as a hyperparameter, setting d = 16 for E2E and d = 128 for ROCStories. See Appendix B for hyperparameter details. |