Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

Authors: Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Baker Grosse

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we trained MR-VAEs to learn rate-distortion curves for image and text reconstruction tasks over a wide range of architectures.
Researcher Affiliation Collaboration 1University of Toronto, 2Vector Institute, 3Fujitsu Limited, 4Anthropic
Pseudocode Yes Algorithm 1 Multi-Rate Variational Autoencoders (MR-VAEs) Require: ψ (Hypernetwork parameters), η (learning rate), (a, b) (sample range) while not converged do B Dtrain Sample a mini-batch η U(log(a), log(b)) Sample inputs to the hypernetwork Q(ψ) := Lexp(η)(ϕψ(exp(η)), θψ(exp(η)); B) Compute the MR-VAE objective ψ ψ η ψQ(ψ) Update hypernetwork parameters end while
Open Source Code Yes We provide sample Py Torch code in Appendix B.2.
Open Datasets Yes We trained MR-VAEs on the MNIST dataset (Deng, 2012)... We trained convolution and Res Net-based architectures on binary static MNIST (Larochelle & Murray, 2011), Omniglot (Lake et al., 2015), CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Celeb A64 (Liu et al., 2015) datasets... Lastly, we trained autoregressive LSTM VAEs on the Yahoo dataset with the set-up from He et al. (2019)... We trained MR-VAEs composed of MLP encoders and decoders on the d Sprites dataset, following the set-up from Chen et al. (2018).
Dataset Splits Yes For MNIST, Omniglot, we used a training set of 50,000, validation set of 10,000, and test set of 10,000. For CIFAR-10 and SVHN, we used a training set of 50,000, a test set of 10,000, and a validation set of 5,000. For CelebA, we used a training set of 27,000, a test set of 1,627, and a validation set of 1,627.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2019) and Jax (Bradbury et al., 2018)" as deep learning frameworks but does not specify their version numbers or other software dependencies with versions.
Experiment Setup Yes We used the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e-4 and a batch size of 64. For image reconstruction tasks, we trained the models for 500 epochs. For text reconstruction, we trained the models for 100 epochs... we use a fixed value a = 0.01 and b = 10 for our image and text reconstruction experiments.