Structured Denoising Diffusion Models in Discrete State-Spaces

Authors: Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rianne van den Berg

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We strongly outperform various non-autoregressive baselines for text generation on character-level text generation, and successfully scale discrete diffusion models to large vocabularies and long sequence lengths. We also achieve strong results on the image dataset CIFAR-10, approaching or exceeding the Gaussian diffusion model from Ho et al. [17] on log-likelihoods and sample quality.
Researcher Affiliation Industry Google Research, Brain Team {jaaustin,ddjohnson,jonathanho,dtarlow,riannevdberg}@google.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation of D3PM framework is available at https://github.com/google-research/google-research/tree/master/d3pm.
Open Datasets Yes For text, we experiment with generation on two datasets: text8 [26], a character-level dataset extracted from English-language Wikipedia, and the One Billion Word dataset (LM1B) [6], a large dataset of shuffled English-language sentences. We evaluate the performance of several D3PM models on the task of unconditional image generation with the dataset CIFAR-10 [25].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits beyond general dataset names, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions "training and evaluating text8 in chunks" and "trained and evaluated on packed sequences" but lacks explicit split details.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "sentencepiece" (with a footnote linking to its GitHub) but does not provide specific version numbers for it or any other key software dependencies or libraries used to replicate the experiment.
Experiment Setup Yes We follow Hoogeboom et al. [18] and use T = 1000 timesteps, although we are also able to evaluate on fewer due to the parameterization in Section 3.3. All models were trained and evaluated on packed sequences of length 128, using a sentencepiece6 vocabulary of size 8192. We trained both D3PM absorbing and D3PM Gauss with the alternative loss function Lλ of (5), and we found λ = 0.001 to work best. See Appendix B.1 for more details on the experimental setup.