NVAE: A Deep Hierarchical Variational Autoencoder

Authors: Arash Vahdat, Jan Kautz

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, Celeb A 64, and Celeb A HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on Celeb A HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256 256 pixels.
Researcher Affiliation Industry Arash Vahdat, Jan Kautz NVIDIA {avahdat, jkautz}@nvidia.com
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides architectural diagrams and mathematical formulations instead.
Open Source Code Yes The source code is available at https://github.com/NVlabs/NVAE.
Open Datasets Yes We examine NVAE on the dynamically binarized MNIST [72], CIFAR-10 [73], Image Net 32 32 [74], Celeb A 64 64 [75, 76], Celeb A HQ 256 256 [28], and FFHQ 256 256 [77] datasets.
Dataset Splits No The paper references the use of various datasets and reports results on them, implying standard splits, but it does not explicitly state the specific train/validation/test dataset splits (e.g., percentages or exact sample counts for each partition) used for reproducibility.
Hardware Specification Yes On a 12-GB Titan V GPU, we can sample a batch of 36 images of the size 256 256 px in 2.03 seconds (56 ms/image).
Software Dependencies No The paper mentions using the 'NVIDIA APEX library [54]' but does not provide specific version numbers for this or any other key software dependencies (e.g., deep learning frameworks, Python version) required for reproducibility.
Experiment Setup Yes For large image datasets such as Celeb A HQ and FFHQ, NVAE consists of 36 groups of latent variables starting from 8 8 dims, scaled up to 128 128 dims with two residual cells per latent variable groups. The implementation details are provided in Sec. A in Appendix. We apply KL balancing mechanism only during KL warm-up (the first 25000 iterations).