D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation
Authors: Abhishek Sinha, Jiaming Song, Chenlin Meng, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate and compare D2C with several state-of-the-art generative models over 6 datasets. On unconditional generation, D2C outperforms state-of-the-art VAEs and is competitive with diffusion models under similar computational budgets. On conditional generation with 100 labeled examples, D2C significantly outperforms state-of-the-art VAE [91] and diffusion models [84]. We report sample quality results7 in Tables 2, and 3. |
| Researcher Affiliation | Academia | Abhishek Sinha Department of Computer Science Stanford University a7b23@stanford.edu Jiaming Song Department of Computer Science Stanford University tsong@cs.stanford.edu Chenlin Meng Department of Computer Science Stanford University chenlin@cs.stanford.edu Stefano Ermon Department of Computer Science Stanford University ermon@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Conditional generation with D2C |
| Open Source Code | Yes | We release our code at https: //github.com/jiamings/d2c. |
| Open Datasets | Yes | We examine the conditional and unconditional generation qualities of D2C over CIFAR-10 [53], CIFAR-100 [53], f Mo W [21], Celeb A-64 [58], Celeb A-HQ-256 [48], and FFHQ-256 [49]. |
| Dataset Splits | No | The paper mentions “training images” and “test set” but does not provide specific percentages or counts for train/validation/test splits, nor does it describe cross-validation setups. |
| Hardware Specification | Yes | On the same Nvidia 1080Ti GPU, it takes 0.013 seconds to obtain the latent code in D2C, while the same takes 8 seconds [106] for Style GAN2 (615 slower). |
| Software Dependencies | No | The paper mentions software components like “NVAE autoencoder structure”, “U-Net diffusion model”, and “MoCo-v2 contrastive representation learning method” but does not provide specific version numbers for these or for broader software frameworks like PyTorch or TensorFlow. |
| Experiment Setup | Yes | For the contrastive weight λ in Equation (4), we consider the value of λ = 10 4 based on the relative scale between the LC and LD2; we find that the results are relatively insensitive to λ. We use 100 diffusion steps for DDIM and D2C unless mentioned otherwise, as running with longer steps is not computationally economical despite tiny gains in FID [84]. We include additional training details, such as architectures, optimizers and learning rates in Appendix C. |