Monte Carlo Variational Auto-Encoders
Authors: Achille Thin, Nikita Kotelevskii, Arnaud Doucet, Alain Durmus, Eric Moulines, Maxim Panov
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these new methods to build novel Monte Carlo VAEs, and show their efficiency on real-world datasets. |
| Researcher Affiliation | Academia | 1CMAP, Ecole Polytechnique, Universite Paris-Saclay, France 2CDISE, Skolkovo Institute of Science and Technology, Moscow, Russia 3Ecole Nationale Sup erieure Paris-Saclay, France 4HDI Lab, HSE University, Moscow, Russia 5University of Oxford. |
| Pseudocode | Yes | Algorithm 1 Langevin Monte Carlo VAE |
| Open Source Code | Yes | The code to reproduce all of the experiments is available online at https://github.com/premolab/metflow/. |
| Open Datasets | Yes | We evaluate our models on three different datasets: MNIST, CIFAR-10 and Celeb A. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits with specific details, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions 'held-out loglikelihood' but not how the split was made. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | All the models are implemented using Py Torch (Paszke et al., 2019) and optimized using the Adam optimizer (Kingma & Ba, 2014) for 100 epochs each. The training process is using Py Torch Lightning toolkit (Falcon, 2019). |
| Experiment Setup | Yes | A crucial hyperparameter of our method is the step size η. In principle, it could be learned by including it as an additional inference parameter φ and by maximizing the ELBO. However, it is then difficult to find a good tradeoff between having a high A/R ratio and a large step size η at the same time. Instead, we suggest adjusting η by targeting a fixed A/R ratio ρ. It has proven effective to use a preconditioned version of (11), i.e. Zk = Zk 1 + η log γk(Zk 1) + 2η Uk with η Rp, where we adapt each component of η using the following rule η(i) = 0.9η(i) +0.1η0/ ϵ+std[ z(i) log pθ(x, z)]. Here std denotes the standard deviation over the batch x of the quantity z(i) log pθ(x, z), and ϵ > 0. The scalar η0 is a tuning parameter which is adjusted to target the A/R ratio ρ. This strategy follows the same heuristics as Adam (Kingma & Ba, 2014). In the following ρ is set to 0.8 for A-MCVAE and 0.9 for L-MCVAE (keeping it high for LMCVAE ensures that the Langevin dynamics stays almost reversible , thus keeping a low variance SIS estimator). An optimal choice of the temperature schedule {βk}K k=0 for SIS and AIS is a difficult problem. We have focused in our experiments on three different settings. First, we consider the temperature schedule fixed and regularly spaced between 0 and 1. Following (Grosse et al., 2015), the second option is the sigmoidal tempering scheme where βk = ( βk β1)/( βK β1) with, βk = σ δ(2k/K 1) , σ is the sigmoid function and δ > 0 is a parameter that we optimize during the training phase. The last schedule consists in learning the temperatures {βk}K k=0 directly as additional inference parameters φ. |