reproducibilityindex.ai

Score-based Generative Modeling in Latent Space

Authors: Arash Vahdat, Karsten Kreis, Jan Kautz

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we achieve state-of-the-art 2.10 FID on CIFAR-10 and 7.22 FID on Celeb A-HQ-256, and signiﬁcantly improve upon likelihoods of previous SGMs. On Celeb A-HQ-256, we outperform previous SGMs in synthesis speed by two orders of magnitude. We also model binarized images, MNIST and OMNIGLOT, achieving state-of-the-art likelihood on the latter. ... 5 Experiments Here, we examine the efﬁcacy of LSGM in learning generative models for images. ... 5.2 Ablation Studies In Tab. 6, we analyze the different weighting mechanisms and variance reduction techniques and compare the geometric VPSDE with the regular VPSDE with linear β(t) [1, 2].
Researcher Affiliation	Industry	Arash Vahdat NVIDIA avahdat@nvidia.com Karsten Kreis NVIDIA kkreis@nvidia.com Jan Kautz NVIDIA jkautz@nvidia.com
Pseudocode	No	The paper describes algorithms and derivations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation is available at https://github.com/NVlabs/LSGM.
Open Datasets	Yes	For CIFAR-10, we train 3 different models: LSGM (FID) and LSGM (balanced) both use the VPSDE with linear β(t) and wun-weighting for the SGM prior in Eq. 9, while performing IS as derived in Sec. 3.4. ... For Celeb A-HQ-256, we observe that when LSGM is trained with different SDE types and weighting mechanisms, it often obtains similar NELBO potentially due to applying the SGM prior only to small latent variable groups and using Normal priors at the larger groups. ... We apply LSGM to binary images using a decoder with pixel-wise independent Bernoulli distributions. For these datasets, we report both NELBO and NLL in nats in Tab. 4 and Tab. 5. On OMNIGLOT, LSGM achieves state-of-the-art likelihood of 87.79 nat, outperforming previous models including VAEs with autoregressive decoders, and even when comparing its NELBO against importance weighted estimation of NLL for other methods. On MNIST, LSGM outperforms previous VAEs in NELBO, reaching a NELBO 1.09 nat lower than the state-of-the-art NVAE.
Dataset Splits	No	The paper mentions training on datasets and reports results but does not specify the train/validation/test splits used.
Hardware Specification	Yes	PC sampling involves 4000 NFEs and takes 44.6 min. on a Titan V for a batch of 16 images.
Software Dependencies	No	The paper states: 'We implement LSGM using the NVAE [20] architecture as VAE backbone and NCSN++ [2] as SGM backbone.' and mentions using a 'black-box ODE solver [73]' but does not provide specific version numbers for any software.
Experiment Setup	Yes	Implementation details: We implement LSGM using the NVAE [20] architecture as VAE backbone and NCSN++ [2] as SGM backbone. NVAE has a hierarchical latent structure. The diffusion process input z0 is constructed by concatenating the latent variables from all groups in the channel dimension. For NVAEs with multiple spatial resolutions in latent groups, we only feed the smallest resolution groups to the SGM prior and assume that the remaining groups have a standard Normal distribution.