On Implicit Regularization in $β$-VAEs
Authors: Abhishek Kumar, Ben Poole
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments to validate the correctness of our theory and accuracy of our approximations. On the MNIST dataset, we empirically verify the relationship between variational distribution covariance and the Jacobian of the decoder. ... We also train models using a tractable lower bound on the deterministic objectives for Celeb A and find that trained models behave similar to β-VAE in terms of sample quality. |
| Researcher Affiliation | Industry | 1Google Research, Brain Team. Correspondence to: Abhishek Kumar <abhishk@google.com>, Ben Poole <pooleb@google.com>. |
| Pseudocode | No | The paper describes mathematical derivations and experimental procedures, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing its code or links to a code repository. |
| Open Datasets | Yes | We conduct experiments on MNIST (Lecun et al., 1998) and Celeb A (Liu et al., 2015) |
| Dataset Splits | Yes | Standard train-test splits are used in all experiments for both datasets. ... We evaluate all three objectives on a fixed held-out test set of 5000 examples on checkpoints from the stochastic β-VAE model. |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types, or cloud computing instances with specifications). |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All MNIST and Celeb A models are trained for 20k and 50k iterations respectively, using the Adam optimizer (Kingma & Ba, 2015). We use 5 layer CNN architectures for both the encoder and decoder, with elu activations (Clevert et al., 2016) in all hidden layers. |