Consistency Regularization for Variational Auto-Encoders

Authors: Samarth Sinha, Adji Bousso Dieng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments (see Section 4), we apply the proposed technique to four vae variants, the original vae (Kingma & Welling, 2013), the importance-weighted auto-encoder (iwae) (Burda et al., 2015), the β-vae (Higgins et al., 2017), and the nouveau variational auto-encoder (nvae) (Vahdat & Kautz, 2020). We found, on four different benchmark datasets, that cr-vaes always yield better representations and generalize better than their base vaes.
Researcher Affiliation Collaboration Samarth Sinha Vector Institute University of Toronto Adji B. Dieng Google Brain Princeton University
Pseudocode Yes Algorithm 1: Consistency Regularization for Variational Autoencoders
Open Source Code Yes 1Code for this work can be found at https://github.com/sinhasam/CRVAE
Open Datasets Yes We first consider mnist. mnist is a handwritten digit recognition dataset with 60, 000 images in the training set and 10, 000 images in the test set (Le Cun, 1998). ... We also consider omniglot, a handwritten alphabet recognition dataset (Lake et al., 2011). ... Finally we consider celeba. It is a dataset of faces, consisting of 162, 770 images for training, 19, 867 images for validation, and 19, 962 images for testing (Liu et al., 2018).
Dataset Splits Yes We form a validation set of 10, 000 images randomly sampled from the training set. ... We use 16, 280 randomly sampled images for training and 1, 000 for validation and the remaining 2, 000 samples for testing. ... It is a dataset of faces, consisting of 162, 770 images for training, 19, 867 images for validation, and 19, 962 images for testing (Liu et al., 2018).
Hardware Specification Yes All experiments were done on a GPU cluster consisting of Nvidia P100 and RTX.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify versions for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The networks are trained with the Adam optimizer with a learning rate of 10 4 (Kingma & Ba, 2014) and trained for 100 epochs with a batch size of 64. We set the dimensionality of the latent variables to 50, therefore the maximum number of active latent units in the latent space is 50. We found λ = 0.1 to be best according to cross-validation using held-out log-likelihood and exploring the range [1e 4, 1.0] datasets. In an ablation study we explore λ = 0. For the β-vae we set λ = 0.1 β and study both β = 0.1 and β = 10, two regimes under which the β-vae performs qualitatively very differently (Higgins et al., 2017).