Mixed-curvature Variational Autoencoders

Authors: Ondrej Skopek, Octavian-Eugen Ganea, Gary Bécigneul

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental empirically, our models outperform current benchmarks on a synthetic tree dataset (Mathieu et al., 2019) and on image reconstruction on the MNIST (Le Cun, 1998), Omniglot (Lake et al., 2015), and CIFAR (Krizhevsky, 2009) datasets for some latent space dimensions.
Researcher Affiliation Collaboration Ondrej Skopek,1 Octavian-Eugen Ganea1,2 & Gary B ecigneul1,2 1 Department of Computer Science, ETH Z urich 2 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology oskopek@oskopek.com, oct@mit.edu, gary.becigneul@inf.ethz.ch Ondrej Skopek (oskopek@google.com) is now at Google.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available on Git Hub at https://github.com/oskopek/mvae.
Open Datasets Yes For our experiments, we use four datasets: (i) Branching diffusion process (Mathieu et al., 2019, BDP) a synthetic tree-like dataset with injected noise, (ii) Dynamically-binarized MNIST digits (Le Cun, 1998) ... (iii) Dynamically-binarized Omniglot characters (Lake et al., 2015) ... and (iv) CIFAR-10 (Krizhevsky, 2009).
Dataset Splits Yes All models in all datasets are trained with early stopping on training ELBO with a lookahead of 50 epochs and a warmup of 100 epochs (Bowman et al., 2016). ... the training set is binarized dynamically ... and the evaluation set is done with a fixed binarization.
Hardware Specification No The paper mentions "the Leonhard cluster, and ETH Z urich for GPU access." This is a general statement and does not specify exact GPU models, CPU types, or other hardware details.
Software Dependencies No The paper mentions using "Adam (Kingma & Ba, 2015) optimizer" and implicitly PyTorch through a reference, but does not specify version numbers for any software dependencies.
Experiment Setup Yes All models in all datasets are trained with early stopping on training ELBO with a lookahead of 50 epochs and a warmup of 100 epochs (Bowman et al., 2016). All BDP models are trained for a 1000 epochs, MNIST and Omniglot models are trained for 300 epochs, and CIFAR for 200 epochs. ... Specifically, we use the Adam (Kingma & Ba, 2015) optimizer with a learning rate of 10-3 and standard settings for β1 = 0.9, β2 = 0.999, and ϵ = 10-8.