Score-Based Diffusion meets Annealed Importance Sampling

Authors: Arnaud Doucet, Will Grathwohl, Alexander G. Matthews, Heiko Strathmann

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We run a number of experiments estimating normalizing constants to validate our approach, MCD and compare to differentiable AIS with ULA [53, 51] and Unadjusted Hamiltonian Annealing (UHA) [18, 54]. We first investigate the performance of these approaches on static target distributions using the same, fixed initial distribution and annealing schedule. Finally, we explore the performance of the methods for VAEs. Here, being the most expensive of our experiments, we include runtime comparisons of our method compared to baselines. Additional results on a Normalizing Flow target can be found in Appendix F.2. Full experimental details, chosen hyper-parameters, and model architectures can be found in Appendix E.
Researcher Affiliation Industry Arnaud Doucet, Will Grathwohl, Alexander G. D. G. Matthews & Heiko Strathmann Deep Mind {arnauddoucet,wgrathwohl,alexmatthews,strathmann}@google.com
Pseudocode Yes Algorithm 1 Unadjusted Langevin AIS/MCD red instructions for AIS and blue for MCD
Open Source Code No We did not publish code.
Open Datasets Yes We train a VAE on the binarized MNIST dataset [43], re-using architectures proposed in [18, 8] (two-layer MLP encoder/decoder, Bernoulli likelihood). All generative models use the same architecture and hyper-parameters. We compare standard amortized variational inference with annealed ULA and UHA with standard AIS backward transition kernels, as well as ULA and UHA with our MCD transition kernels. We match the number of sampler steps between ULA/ULA-MCD and UHA/UHAMCD to 64 and 32 respectively. ELBO and log-likelihood values on the test set are presented in Table 3.
Dataset Splits No The paper uses a 'test set' for evaluation but does not provide specific percentages, sample counts, or a detailed methodology for how the training, validation, and test splits were created for its datasets.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only reports 'Iteration time' and 'Total time'.
Software Dependencies No The paper mentions general tools like JAX in the references ([4], [6]) but does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Full experimental details, chosen hyper-parameters, and model architectures can be found in Appendix E. For all methods, sampling step-sizes per-timestep are tuned to via gradient descent to maximize the ELBO and the diagonal mass matrix is learned for the Hamiltonian samplers. Our score model is parameterized by an MLP with residual connections that is conditioned on integration time t, and on the momentum term for the Hamiltonian case (see Algorithm 2). For an ablation on various network architectures we refer the reader to Appendix F.3.