Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap
Authors: Edwin Fong, Simon Lyddon, Chris Holmes
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this on Gaussian mixture model and sparse logistic regression examples. We compare NPL to conventional Bayesian inference with the No-U-Turn Sampler (NUTS) and Automatic Differentiation Variational Inference (ADVI). We evaluate the predictive performance of each method on held-out test data. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Oxford, Oxford, United Kingdom 2The Alan Turing Institute, London, United Kingdom. |
| Pseudocode | Yes | Algorithm 1 NPL Posterior Sampling; Algorithm 2 Posterior Bootstrap Sampling; Algorithm 3 RR-NPL Posterior Sampling; Algorithm 4 FI-NPL Posterior Sampling. |
| Open Source Code | Yes | We now demonstrate our method on some examples; the code is available online 1. 1https://github.com/edfong/npl |
| Open Datasets | Yes | We analyze 3 binary classification datasets from the UCI ML repository (Dheeru & Karra Taniskidou, 2017): Adult (Kohavi, 1996), Polish companies bankruptcy 3rd year , (Zikeba et al., 2016), and Arcene (Guyon et al., 2005) with details in Table 3. MNIST (Le Cun & Cortes, 2010). |
| Dataset Splits | Yes | We generate ntrain = 1000 for model fitting and another ntest = 250 held-out for model evaluation with different seeds for each of the 30 runs. We carry out a random stratified train-test split for each of the 30 runs, with 80-20 split for Adult , Polish and 50-50 split for Arcene due to the smaller dataset. |
| Hardware Specification | Yes | All NPL examples are run on 4 Azure F72s v2 (72 v CPUs) virtual machines, implemented in Python. The NUTS and ADVI examples cannot be im-plemented in an embarrassingly parallel manner, so they are run on a single Azure F72s v2. |
| Software Dependencies | No | The paper mentions software like 'Python', 'sklearn.mixture', 'scipy.optimize', and 'Stan' but does not specify exact version numbers for these dependencies. |
| Experiment Setup | Yes | For the Bayesian model we set a0 = 1, and for NPL we set α = 0 as n p. We optimize each bootstrap maximization with a weighted EM algorithm... For RR-NPL, we initialize π Dir(1, . . . , 1), µkj unif( 2, 6) and σ2 kj IG(1, 1) for each restart. For FI-NPL we initialize with one of the posterior modes from RR-NPL. We produce 2000 posterior samples for each method. |