Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

Authors: Luhuan Wu, Brian Trippe, Christian Naesseth, David Blei, John P. Cunningham

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first find in simulation and in conditional image generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models; on benchmark tasks, TDS allows flexible conditioning criteria and often outperforms the state-of-the-art, conditionally trained model.
Researcher Affiliation Academia Luhuan Wu Columbia University lw2827@columbia.edu Brian L. Trippe* Columbia University blt2114@columbia.edu Christian A. Naesseth University of Amsterdam c.a.naesseth@uva.nl David M. Blei Columbia University david.blei@columbia.edu John P. Cunningham Columbia University jpc2181@columbia.edu
Pseudocode Yes Algorithm 1: Twisted Diffusion Sampler (TDS)
Open Source Code Yes Code: https://github.com/blt2114/twisted_diffusion_sampler
Open Datasets Yes On the MNIST dataset, we compare TDS to TDS-IS, Gradient Guidance, and IS. We next apply TDS to higher dimension datasets. Figure 2c shows samples from TDS (K = 16) using a pre-trained diffusion model and a pretrained classifier on the Image Net dataset (256 256 3 dimensions).
Dataset Splits No The paper mentions '10,000 validation images' for inpainting tasks in Appendix D.2.2, but does not explicitly provide the training/validation/test dataset splits (e.g., percentages or counts) for model training in a reproducible manner.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used to run its experiments.
Software Dependencies No The paper mentions 'guided diffusion codebase' and 'Res Net50 model' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes The model architecture is based on the guided diffusion codebase with the following specifications: number of channels = 64, attention resolutions = '28,14,7', number of residual blocks = 3, learn sigma (i.e. to learn the variance of pθ(xt 1 | xt)) = True, resblock updown = True, dropout = 0.1, variance schedule = 'linear'. We trained the model for 60k epochs with a batch size of 128 and a learning rate of 10 4 on 60k MNIST training images. The model uses T = 1, 000 for training and T = 100 for sampling.