Generating Diverse High-Fidelity Images with VQ-VAE-2

Authors: Ali Razavi, Aaron van den Oord, Oriol Vinyals

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as Image Net, while not suffering from GAN s known shortcomings such as mode collapse and lack of diversity. 5 Experiments
Researcher Affiliation Industry Ali Razavi Deep Mind alirazavi@google.com Aäron van den Oord Deep Mind avdnoord@google.com Oriol Vinyals Deep Mind vinyals@google.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We use the released VQ-VAE implementation in the Sonnet library 2 3. 2https://github.com/deepmind/sonnet/blob/master/sonnet/python/modules/nets/vqvae.py 3https://github.com/deepmind/sonnet/blob/master/sonnet/examples/vqvae_example.ipynb
Open Datasets Yes We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as Image Net... To further assess the effectiveness of our multi-scale approach for capturing extremely long range dependencies in the data, we train a three level hierarchical model over the FFHQ dataset [15] at 1024 1024 resolution.
Dataset Splits Yes Table 1: Train and validation negative log-likelihood (NLL) for top and bottom prior measured by encoding train and validation set resp., as well as Mean Squared Error for train and validation set. The small difference in both NLL and MSE suggests that neither the prior network nor the VQ-VAE overfit.
Hardware Specification No The paper mentions training on 'TPUs' and 'hardware accelerators' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or TPU versions used for experiments.
Software Dependencies No The paper mentions using the 'Sonnet library' for VQ-VAE implementation, but does not provide specific version numbers for Sonnet or any other software dependencies.
Experiment Setup Yes The operator sg refers to a stop-gradient operation that blocks gradients from flowing into its argument, and β is a hyperparameter which controls the reluctance to change the code corresponding to the encoder output. γ is a decay parameter with a value between 0 and 1 (default γ = 0.99 is used in all experiments).