reproducibilityindex.ai

Consistency Regularization for Variational Auto-Encoders

Authors: Samarth Sinha, Adji Bousso Dieng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments (see Section 4), we apply the proposed technique to four vae variants, the original vae (Kingma & Welling, 2013), the importance-weighted auto-encoder (iwae) (Burda et al., 2015), the β-vae (Higgins et al., 2017), and the nouveau variational auto-encoder (nvae) (Vahdat & Kautz, 2020). We found, on four diﬀerent benchmark datasets, that cr-vaes always yield better representations and generalize better than their base vaes.
Researcher Affiliation	Collaboration	Samarth Sinha Vector Institute University of Toronto Adji B. Dieng Google Brain Princeton University
Pseudocode	Yes	Algorithm 1: Consistency Regularization for Variational Autoencoders
Open Source Code	Yes	1Code for this work can be found at https://github.com/sinhasam/CRVAE
Open Datasets	Yes	We ﬁrst consider mnist. mnist is a handwritten digit recognition dataset with 60, 000 images in the training set and 10, 000 images in the test set (Le Cun, 1998). ... We also consider omniglot, a handwritten alphabet recognition dataset (Lake et al., 2011). ... Finally we consider celeba. It is a dataset of faces, consisting of 162, 770 images for training, 19, 867 images for validation, and 19, 962 images for testing (Liu et al., 2018).
Dataset Splits	Yes	We form a validation set of 10, 000 images randomly sampled from the training set. ... We use 16, 280 randomly sampled images for training and 1, 000 for validation and the remaining 2, 000 samples for testing. ... It is a dataset of faces, consisting of 162, 770 images for training, 19, 867 images for validation, and 19, 962 images for testing (Liu et al., 2018).
Hardware Specification	Yes	All experiments were done on a GPU cluster consisting of Nvidia P100 and RTX.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' but does not specify versions for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The networks are trained with the Adam optimizer with a learning rate of 10 4 (Kingma & Ba, 2014) and trained for 100 epochs with a batch size of 64. We set the dimensionality of the latent variables to 50, therefore the maximum number of active latent units in the latent space is 50. We found λ = 0.1 to be best according to cross-validation using held-out log-likelihood and exploring the range [1e 4, 1.0] datasets. In an ablation study we explore λ = 0. For the β-vae we set λ = 0.1 β and study both β = 0.1 and β = 10, two regimes under which the β-vae performs qualitatively very diﬀerently (Higgins et al., 2017).