Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Measuring Sample Quality with Kernels
Authors: Jackson Gorham, Lester Mackey
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next conduct an empirical evaluation of the KSD quality measures recommended by our theory, recording all timings on an Intel Xeon CPU E5-2650 v2 @ 2.60GHz. Throughout, we will refer to the KSD with IMQ base kernel k(x, y) = (c2 + kx yk2 2)Ξ², exponent Ξ² = 1 2, and c = 1 as the IMQ KSD. Code reproducing all experiments can be found on the Julia (Bezanson et al., 2014) package site https://jgorham.github.io/ Stein Discrepancy.jl/. |
| Researcher Affiliation | Collaboration | 1Stanford University, Palo Alto, CA USA 2Microsoft Research New England, Cambridge, MA USA. |
| Pseudocode | No | No structured pseudocode or algorithm blocks (e.g., a clearly labeled 'Algorithm' or 'Pseudocode' section) were found in the paper. |
| Open Source Code | Yes | Code reproducing all experiments can be found on the Julia (Bezanson et al., 2014) package site https://jgorham.github.io/ Stein Discrepancy.jl/. |
| Open Datasets | Yes | Specifically, we evaluate the SGFS-f and SGFS-d samples produced in (Ahn et al., 2012, Sec. 5.1). The target P is a Bayesian logistic regression with a ο¬at prior, conditioned on a dataset of 104 MNIST handwritten digit images. |
| Dataset Splits | No | The paper describes generating sample sequences (e.g., 'generated 50 independent approximate slice sampling chains') and evaluating their quality, but does not specify traditional train/validation/test dataset splits with percentages or counts as would be found in supervised learning. |
| Hardware Specification | Yes | recording all timings on an Intel Xeon CPU E5-2650 v2 @ 2.60GHz. |
| Software Dependencies | No | The paper mentions 'Julia (Bezanson et al., 2014)' as the platform for their code but does not list specific software dependencies or libraries with their version numbers required for reproduction. |
| Experiment Setup | Yes | For an array of values, we generated 50 independent approximate slice sampling chains with batch size 5, each with a budget of 148000 likelihood evaluations, and plotted the median IMQ KSD and effective sample size (ESS, a standard sample quality measure based on asymptotic variance (Brooks et al., 2011)) in Figure 3. |