Score-based Generative Modeling in Latent Space
Authors: Arash Vahdat, Karsten Kreis, Jan Kautz
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we achieve state-of-the-art 2.10 FID on CIFAR-10 and 7.22 FID on Celeb A-HQ-256, and significantly improve upon likelihoods of previous SGMs. On Celeb A-HQ-256, we outperform previous SGMs in synthesis speed by two orders of magnitude. We also model binarized images, MNIST and OMNIGLOT, achieving state-of-the-art likelihood on the latter. ... 5 Experiments Here, we examine the efficacy of LSGM in learning generative models for images. ... 5.2 Ablation Studies In Tab. 6, we analyze the different weighting mechanisms and variance reduction techniques and compare the geometric VPSDE with the regular VPSDE with linear β(t) [1, 2]. |
| Researcher Affiliation | Industry | Arash Vahdat NVIDIA avahdat@nvidia.com Karsten Kreis NVIDIA kkreis@nvidia.com Jan Kautz NVIDIA jkautz@nvidia.com |
| Pseudocode | No | The paper describes algorithms and derivations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/NVlabs/LSGM. |
| Open Datasets | Yes | For CIFAR-10, we train 3 different models: LSGM (FID) and LSGM (balanced) both use the VPSDE with linear β(t) and wun-weighting for the SGM prior in Eq. 9, while performing IS as derived in Sec. 3.4. ... For Celeb A-HQ-256, we observe that when LSGM is trained with different SDE types and weighting mechanisms, it often obtains similar NELBO potentially due to applying the SGM prior only to small latent variable groups and using Normal priors at the larger groups. ... We apply LSGM to binary images using a decoder with pixel-wise independent Bernoulli distributions. For these datasets, we report both NELBO and NLL in nats in Tab. 4 and Tab. 5. On OMNIGLOT, LSGM achieves state-of-the-art likelihood of 87.79 nat, outperforming previous models including VAEs with autoregressive decoders, and even when comparing its NELBO against importance weighted estimation of NLL for other methods. On MNIST, LSGM outperforms previous VAEs in NELBO, reaching a NELBO 1.09 nat lower than the state-of-the-art NVAE. |
| Dataset Splits | No | The paper mentions training on datasets and reports results but does not specify the train/validation/test splits used. |
| Hardware Specification | Yes | PC sampling involves 4000 NFEs and takes 44.6 min. on a Titan V for a batch of 16 images. |
| Software Dependencies | No | The paper states: 'We implement LSGM using the NVAE [20] architecture as VAE backbone and NCSN++ [2] as SGM backbone.' and mentions using a 'black-box ODE solver [73]' but does not provide specific version numbers for any software. |
| Experiment Setup | Yes | Implementation details: We implement LSGM using the NVAE [20] architecture as VAE backbone and NCSN++ [2] as SGM backbone. NVAE has a hierarchical latent structure. The diffusion process input z0 is constructed by concatenating the latent variables from all groups in the channel dimension. For NVAEs with multiple spatial resolutions in latent groups, we only feed the smallest resolution groups to the SGM prior and assume that the remaining groups have a standard Normal distribution. |