Transport meets Variational Inference: Controlled Monte Carlo Diffusions
Authors: Francisco Vargas, Shreyas Padhy, Denis Blessing, Nikolas Nüsken
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now empirically demonstrate the performance of the proposed CMCD sampler (24) in both underdamped (detailed in Appendix D) and overdamped (CMCD (OD)), Appendix D.6) formulations on a series of sampling benchmarks. We first replicate the benchmarks from Geffner & Domke (2023) on 6 standard target benchmark distributions. |
| Researcher Affiliation | Academia | Francisco Vargas , Shreyas Padhy University of Cambridge Cambridge, UK {fav25,sp2058}@cam.ac.uk Denis Blessing KIT Karlsruhe, Germany jl8142@kit.edu Nikolas N usken Kings College London London, UK nik.nuesken@gmx.de |
| Pseudocode | Yes | Algorithm 1 Controlled Monte Carlo Diffusions Sampling and normalizing constant estimation |
| Open Source Code | Yes | Finally, we have explored the CMCD inference scheme obtaining state-of-the-art results across a suite of challenging inference benchmarks. We believe this experimental success is partly due to our approach striking a balance between parametrising a flexible family of distributions whilst being constrained enough such that learning the sampler is not overly expensive (Tzen & Raginsky, 2019b; Vargas et al., 2023c). Future directions can explore optimal schemes for the annealed flow πt (Goshtasbpour et al., 2023) and alternate divergences (N usken & Richter, 2021; Richter & Berner, 2024; Midgley et al., 2022). 6https://github.com/shreyaspadhy/CMCD |
| Open Datasets | Yes | log sonar (d = 61) and log ionosphere (d = 35) are Bayesian logistic regression models... brownian (d = 32) corresponds to the time discretisation of a Brownian motion... lorenz (d = 90) is the discretisation of a highly stiff 3-dimensional SDE... seeds (d = 26) is a random effect regression model... lgcp (d = 1600) is a high-dimensional Log Gaussian Cox process popular in spatial statistics (Møller et al., 1998)... funnel (d = 10) is a challenging distribution given by πT (x1:10 = N(x1; 0, σ2 f)N(x2:10; 0, exp(x1)I), with σ2 f = 9 (Neal, 2003)... gmm (d = 2) is a two-dimensional Gaussian mixture model with three modes... |
| Dataset Splits | Yes | For the generative modelling tasks we use 30 time steps and train for 100 epochs whilst for the double well we train all experiments for 17 epochs (early stopping via the validation set) and 60 discretisation steps. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions general terms like "GPU" implicitly through the context of machine learning. |
| Software Dependencies | No | The paper mentions software components like "ADAM" and "Python Optimal Transport" but does not specify their version numbers, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | We first pretrain the source distribution to a mean-field Gaussian distribution trained for 150, 000 steps with ADAM and a learning rate of 10 2. We then train for 150000 iterations with a batch size of 5, tuning learning rate between [10 5, 10 4, 10 3] picking the best one based on mean ELBO after training... We select the optimal learning rate in [10 3, 10 4, 10 5], the optimal standard deviation of the source distribution σinit in [1, 2, 3, 4, 5] and the optimal α in [0.1, 0.5, 1, 1.5, 2]. Instead of training ϵ = δtσ, we sweep over an optimal value in [10 2, 10 1, 1]. The models are trained with a batch size of 300 for 11000 steps, where we keep the source distribution parameters fixed, as well as ϵ. |