D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation

Authors: Abhishek Sinha, Jiaming Song, Chenlin Meng, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate and compare D2C with several state-of-the-art generative models over 6 datasets. On unconditional generation, D2C outperforms state-of-the-art VAEs and is competitive with diffusion models under similar computational budgets. On conditional generation with 100 labeled examples, D2C significantly outperforms state-of-the-art VAE [91] and diffusion models [84]. We report sample quality results7 in Tables 2, and 3.
Researcher Affiliation Academia Abhishek Sinha Department of Computer Science Stanford University a7b23@stanford.edu Jiaming Song Department of Computer Science Stanford University tsong@cs.stanford.edu Chenlin Meng Department of Computer Science Stanford University chenlin@cs.stanford.edu Stefano Ermon Department of Computer Science Stanford University ermon@cs.stanford.edu
Pseudocode Yes Algorithm 1 Conditional generation with D2C
Open Source Code Yes We release our code at https: //github.com/jiamings/d2c.
Open Datasets Yes We examine the conditional and unconditional generation qualities of D2C over CIFAR-10 [53], CIFAR-100 [53], f Mo W [21], Celeb A-64 [58], Celeb A-HQ-256 [48], and FFHQ-256 [49].
Dataset Splits No The paper mentions “training images” and “test set” but does not provide specific percentages or counts for train/validation/test splits, nor does it describe cross-validation setups.
Hardware Specification Yes On the same Nvidia 1080Ti GPU, it takes 0.013 seconds to obtain the latent code in D2C, while the same takes 8 seconds [106] for Style GAN2 (615 slower).
Software Dependencies No The paper mentions software components like “NVAE autoencoder structure”, “U-Net diffusion model”, and “MoCo-v2 contrastive representation learning method” but does not provide specific version numbers for these or for broader software frameworks like PyTorch or TensorFlow.
Experiment Setup Yes For the contrastive weight λ in Equation (4), we consider the value of λ = 10 4 based on the relative scale between the LC and LD2; we find that the results are relatively insensitive to λ. We use 100 diffusion steps for DDIM and D2C unless mentioned otherwise, as running with longer steps is not computationally economical despite tiny gains in FID [84]. We include additional training details, such as architectures, optimizers and learning rates in Appendix C.