reproducibilityindex.ai

NVAE: A Deep Hierarchical Variational Autoencoder

Authors: Arash Vahdat, Jan Kautz

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, Celeb A 64, and Celeb A HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on Celeb A HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the ﬁrst successful VAE applied to natural images as large as 256 256 pixels.
Researcher Affiliation	Industry	Arash Vahdat, Jan Kautz NVIDIA {avahdat, jkautz}@nvidia.com
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides architectural diagrams and mathematical formulations instead.
Open Source Code	Yes	The source code is available at https://github.com/NVlabs/NVAE.
Open Datasets	Yes	We examine NVAE on the dynamically binarized MNIST [72], CIFAR-10 [73], Image Net 32 32 [74], Celeb A 64 64 [75, 76], Celeb A HQ 256 256 [28], and FFHQ 256 256 [77] datasets.
Dataset Splits	No	The paper references the use of various datasets and reports results on them, implying standard splits, but it does not explicitly state the specific train/validation/test dataset splits (e.g., percentages or exact sample counts for each partition) used for reproducibility.
Hardware Specification	Yes	On a 12-GB Titan V GPU, we can sample a batch of 36 images of the size 256 256 px in 2.03 seconds (56 ms/image).
Software Dependencies	No	The paper mentions using the 'NVIDIA APEX library [54]' but does not provide specific version numbers for this or any other key software dependencies (e.g., deep learning frameworks, Python version) required for reproducibility.
Experiment Setup	Yes	For large image datasets such as Celeb A HQ and FFHQ, NVAE consists of 36 groups of latent variables starting from 8 8 dims, scaled up to 128 128 dims with two residual cells per latent variable groups. The implementation details are provided in Sec. A in Appendix. We apply KL balancing mechanism only during KL warm-up (the ﬁrst 25000 iterations).