Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Surrogate Likelihoods for Variational Annealed Importance Sampling

Authors: Martin Jankowiak, Du Phan

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an extensive empirical comparison we show that our method performs well in practice and that it is well-suited for black-box inference in probabilistic programming frameworks. ... We compare the performance of SL-DAIS and NL-DAIS to various MCMC and variational baselines. Our experiments are implemented using JAX (Bradbury et al., 2020) and Num Pyro (Phan et al., 2019; Bingham et al., 2019).
Researcher Affiliation	Collaboration	Martin Jankowiak 1 Du Phan 2 1Broad Institute, Cambridge, MA, USA 2Google Research, Cambridge, MA, USA.
Pseudocode	Yes	Algorithm 1 SL-DAIS: Surrogate Likelihood Differentiable Annealed Importance Sampling. ... Algorithm 2 NS-DAIS: Naive Subsampling Differentiable Annealed Importance Sampling. ... Algorithm 3 DAIS: Differentiable Annealed Importance Sampling.
Open Source Code	No	An open source implementation of our method will be made available at https://num.pyro.ai/en/stable/autoguide.html.
Open Datasets	Yes	The Higgs and SUSY datasets are described in (Baldi et al., 2014) and available from the UCI repository (Asuncion & Newman, 2007). The Mini Boo NE dataset is from (Roe et al., 2005) and is likewise available from UCI. For the two Cov Type datasets (Blackard & Dean, 1999), which are also vailable from UCI... The Precipitation dataset is available from the IRI/LDEO Climate Data Library.12 This spatio-temporal dataset represents the WASP index (Weighted Anomaly Standardized Precipitation) at various latitudes and longitudes.
Dataset Splits	No	The paper explicitly describes train and test data splits (e.g., 'N = 8000 data points for training and use the remaining 2127 data points for testing' in A.7.5 and '20k data points held out for testing' in A.7.2), but it does not specify a separate validation dataset split.
Hardware Specification	Yes	Runtimes are for a CPU with 24 cores (Intel Xeon Gold 5220R 2.2GHz). ... We compare CPU runtime (24 cores; Intel Xeon Gold 5220R 2.2GHz) to GPU runtime (NVIDIA Tesla K80).
Software Dependencies	No	The paper mentions software like JAX and Num Pyro but does not provide specific version numbers for them. For example, 'Our experiments are implemented using JAX (Bradbury et al., 2020) and Num Pyro (Phan et al., 2019; Bingham et al., 2019).' It also mentions 'Adam optimizer' without a version.
Experiment Setup	Yes	All experiments use 64-bit ﬂoating point precision. We use the Adam optimizer with default momentum hyperparameters in all experiments (Kingma & Ba, 2014). In all optimization runs we do 3 × 10^5 optimization steps with an initial learning rate of 10^-3 that drops to 10^-4 and 10^-5 at iteration 10^5 and 2 × 10^5, respectively. Similar to Geffner & Domke (2021) we parameterize the step size ηk in the kth iteration of DAIS/NS-DAIS/SL-DAIS as ηk = clip( η + κβk, min = 0, max = ηmax) where η and κ are learnable parameters and we choose ηmax = 0.25. The inverse temperatures {βk} are parameterized using the exponential transform to enforce positivity and a cumulative summation to enforce monotonicity. For SL-DAIS the learnable weights ω are uniformly initialized so that their total weight is equal to the number of data points, i.e. Pn ωn = N. In all experiments we use a diagonal mass matrix M. We initialize η to be small, e.g. η 10^-4 − 10^-2, and initialize κ to κ = 0. We initialize γ to γ = 0.9.