Score-Based Diffusion meets Annealed Importance Sampling
Authors: Arnaud Doucet, Will Grathwohl, Alexander G. Matthews, Heiko Strathmann
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We run a number of experiments estimating normalizing constants to validate our approach, MCD and compare to differentiable AIS with ULA [53, 51] and Unadjusted Hamiltonian Annealing (UHA) [18, 54]. We first investigate the performance of these approaches on static target distributions using the same, fixed initial distribution and annealing schedule. Finally, we explore the performance of the methods for VAEs. Here, being the most expensive of our experiments, we include runtime comparisons of our method compared to baselines. Additional results on a Normalizing Flow target can be found in Appendix F.2. Full experimental details, chosen hyper-parameters, and model architectures can be found in Appendix E. |
| Researcher Affiliation | Industry | Arnaud Doucet, Will Grathwohl, Alexander G. D. G. Matthews & Heiko Strathmann Deep Mind {arnauddoucet,wgrathwohl,alexmatthews,strathmann}@google.com |
| Pseudocode | Yes | Algorithm 1 Unadjusted Langevin AIS/MCD red instructions for AIS and blue for MCD |
| Open Source Code | No | We did not publish code. |
| Open Datasets | Yes | We train a VAE on the binarized MNIST dataset [43], re-using architectures proposed in [18, 8] (two-layer MLP encoder/decoder, Bernoulli likelihood). All generative models use the same architecture and hyper-parameters. We compare standard amortized variational inference with annealed ULA and UHA with standard AIS backward transition kernels, as well as ULA and UHA with our MCD transition kernels. We match the number of sampler steps between ULA/ULA-MCD and UHA/UHAMCD to 64 and 32 respectively. ELBO and log-likelihood values on the test set are presented in Table 3. |
| Dataset Splits | No | The paper uses a 'test set' for evaluation but does not provide specific percentages, sample counts, or a detailed methodology for how the training, validation, and test splits were created for its datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only reports 'Iteration time' and 'Total time'. |
| Software Dependencies | No | The paper mentions general tools like JAX in the references ([4], [6]) but does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Full experimental details, chosen hyper-parameters, and model architectures can be found in Appendix E. For all methods, sampling step-sizes per-timestep are tuned to via gradient descent to maximize the ELBO and the diagonal mass matrix is learned for the Hamiltonian samplers. Our score model is parameterized by an MLP with residual connections that is conditioned on integration time t, and on the momentum term for the Hamiltonian case (see Algorithm 2). For an ablation on various network architectures we refer the reader to Appendix F.3. |