Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
Authors: Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Baker Grosse
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we trained MR-VAEs to learn rate-distortion curves for image and text reconstruction tasks over a wide range of architectures. |
| Researcher Affiliation | Collaboration | 1University of Toronto, 2Vector Institute, 3Fujitsu Limited, 4Anthropic |
| Pseudocode | Yes | Algorithm 1 Multi-Rate Variational Autoencoders (MR-VAEs) Require: ψ (Hypernetwork parameters), η (learning rate), (a, b) (sample range) while not converged do B Dtrain Sample a mini-batch η U(log(a), log(b)) Sample inputs to the hypernetwork Q(ψ) := Lexp(η)(ϕψ(exp(η)), θψ(exp(η)); B) Compute the MR-VAE objective ψ ψ η ψQ(ψ) Update hypernetwork parameters end while |
| Open Source Code | Yes | We provide sample Py Torch code in Appendix B.2. |
| Open Datasets | Yes | We trained MR-VAEs on the MNIST dataset (Deng, 2012)... We trained convolution and Res Net-based architectures on binary static MNIST (Larochelle & Murray, 2011), Omniglot (Lake et al., 2015), CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Celeb A64 (Liu et al., 2015) datasets... Lastly, we trained autoregressive LSTM VAEs on the Yahoo dataset with the set-up from He et al. (2019)... We trained MR-VAEs composed of MLP encoders and decoders on the d Sprites dataset, following the set-up from Chen et al. (2018). |
| Dataset Splits | Yes | For MNIST, Omniglot, we used a training set of 50,000, validation set of 10,000, and test set of 10,000. For CIFAR-10 and SVHN, we used a training set of 50,000, a test set of 10,000, and a validation set of 5,000. For CelebA, we used a training set of 27,000, a test set of 1,627, and a validation set of 1,627. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al., 2019) and Jax (Bradbury et al., 2018)" as deep learning frameworks but does not specify their version numbers or other software dependencies with versions. |
| Experiment Setup | Yes | We used the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e-4 and a batch size of 64. For image reconstruction tasks, we trained the models for 500 epochs. For text reconstruction, we trained the models for 100 epochs... we use a fixed value a = 0.01 and b = 10 for our image and text reconstruction experiments. |