Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Automatic Reparameterisation of Probabilistic Programs
Authors: Maria Gorinova, Dave Moore, Matthew Hoffman
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare these strategies to a fixed centred and non-centred parameterisation across a range of well-known hierarchical models. Our results suggest that both VIP and i HMC can enable for more automated robust inference, often performing at least as well as the best fixed parameterisation and sometimes better, without requiring a priori knowledge of the optimal parameterisation. |
| Researcher Affiliation | Collaboration | Maria I. Gorinova * 1 Dave Moore 2 Matthew D. Hoffman 2 *Work done while interning at Google. 1University of Edinburgh, Edinburgh, UK 2Google, San Francisco, CA, USA. |
| Pseudocode | Yes | Algorithm 1: Interleaved Hamiltonian Monte Carlo Algorithm 2: Variationally Inferred Parameterisation |
| Open Source Code | Yes | Code for these algorithms and experiments is available at https://github.com/mgorinova/autoreparam. |
| Open Datasets | Yes | Eight schools (Rubin, 1981): estimating the treatment effects θi of a course taught at each of i = 1 . . . 8 schools, given test scores yi and standard errors σi: Radon (Gelman & Hill, 2006): hierarchical linear regression, in which the radon level ri in a home i in county c is modelled as a function of the (unobserved) county-level effect mc, the county uranium reading uc, and xi, the number of floors in the home: German credit (Dua & Graff, 2017): logistic regression; hierarchical prior on coefficient scales: Election 88 (Gelman & Hill, 2006): logistic model of 1988 US presidential election outcomes by county, given demographic covariates xi and state-level effects αs: Electric Company (Gelman & Hill, 2006): paired causal analysis of the effect of viewing an educational TV show on each of 192 classforms over G = 4 grades. |
| Dataset Splits | No | The paper does not provide specific dataset splits (e.g., percentages or counts) for training, validation, or testing. It mentions using variational optimization, which serves a validation purpose, but not in terms of dataset partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | We implement reparameterisation handlers in Edward2, a deep PPL embedded in Python and Tensor Flow (Tran et al., 2018). No version numbers for Edward2, Python, or TensorFlow are specified. |
| Experiment Setup | Yes | The HMC step size and number of leapfrog steps were tuned following the procedures described in Appendix C, which also contains additional details of the experimental setup. |