Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Measuring Sample Quality with Stein's Method
Authors: Jackson Gorham, Lester Mackey
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now turn to an empirical evaluation of our proposed quality measures. We compute all spanners using the efficient C++ greedy spanner implementation of Bouts et al. [19] and solve all optimization programs using Julia for Mathematical Programming [20] with the default Gurobi 6.0.4 solver [21]. All reported timings are obtained using a single core of an Intel Xeon CPU E5-2650 v2 @ 2.60GHz. 5.1 A Simple Example We begin with a simple example to illuminate a few properties of the Stein diagnostic. For the target P = N(0, 1), we generate a sequence of sample points i.i.d. from the target and a second sequence i.i.d. from a scaled Student s t distribution with matching variance and 10 degrees of freedom. The left panel of Figure 1 shows that the complete graph Stein discrepancy applied to the first n Gaussian sample points decays to zero at an n 0.52 rate, while the discrepancy applied to the scaled Student s t sample remains bounded away from zero. |
| Researcher Affiliation | Academia | Jackson Gorham Department of Statistics Stanford University Lester Mackey Department of Statistics Stanford University |
| Pseudocode | Yes | Algorithm 1 Multivariate Spanner Stein Discrepancy Algorithm 2 Univariate Complete Graph Stein Discrepancy |
| Open Source Code | No | The paper mentions using third-party open-source tools like "C++ greedy spanner implementation of Bouts et al. [19]" and "Julia for Mathematical Programming [20] with the default Gurobi 6.0.4 solver [21]", but does not state that the authors' own developed methodology code is open-source or provide a link to it. |
| Open Datasets | No | The paper refers to target distributions like N(0,1) or Unif(0,1), or uses datasets referenced by a paper citation for context (e.g., "bimodal Gaussian mixture model (GMM) posterior of [3]", "dataset of 53 prostate cancer patients... [24]"), but it does not provide concrete access information (like a direct URL, DOI, or repository) for these datasets, nor does it explicitly state they are publicly available with proper attribution. |
| Dataset Splits | No | The paper does not specify exact train/validation/test dataset splits, percentages, or absolute sample counts for reproducibility. It discusses sample sizes (e.g., "sequences of length n = 1000") but not data partitioning for training, validation, and testing. |
| Hardware Specification | Yes | All reported timings are obtained using a single core of an Intel Xeon CPU E5-2650 v2 @ 2.60GHz. |
| Software Dependencies | Yes | All reported timings are obtained using a single core of an Intel Xeon CPU E5-2650 v2 @ 2.60GHz. We compute all spanners using the efficient C++ greedy spanner implementation of Bouts et al. [19] and solve all optimization programs using Julia for Mathematical Programming [20] with the default Gurobi 6.0.4 solver [21]. |
| Experiment Setup | Yes | For a range of step sizes ε, we use SGLD with minibatch size 5 to draw 50 independent sequences of length n = 1000, and we select the value of ε with the highest median quality either the maximum effective sample size (ESS, a standard diagnostic based on autocorrelation [1]) or the minimum spanner Stein discrepancy across these sequences. |