reproducibilityindex.ai

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

Authors: Qiang Liu, Jason Lee, Michael Jordan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results in this section. We start with a toy case of 1D Gaussian mixture on which we can compare with the classical goodness-of-ﬁt tests that only work for univariate distributions, and then proceed to Gaussian Bernoulli restricted Boltzmann machine (RBM), a graphical model widely used in deep learning (Welling et al., 2004; Hinton & Salakhutdinov, 2006). The following methods are evaluated, all with a signiﬁcance level of 0.05:
Researcher Affiliation	Academia	Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 03755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer Science University of California, Berkeley, CA 94709
Pseudocode	Yes	Algorithm 1 Bootstrap Goodness-of-ﬁt Test based on KSD Input: Sample {xi} and score function sq(x) = x log q(x). Bootstrap sample size m. Test: H0: {xi} is drawn from q v.s H1: {xi} is not drawn from q. 1. Compute ˆSu by (14) and uq(x, x ) as deﬁned in Theorem 3.6. Generate m bootstrap sample ˆS u by (16). 2. Reject H0 with signiﬁcance level α if the percentage of ˆS u that satisﬁes ˆS u > ˆSu is less than α.
Open Source Code	No	The paper mentions using a third-party resource for MMD: "We use the mmd Test Boot.m under http://www. gatsby.ucl.ac.uk/%7Egretton/mmd/mmd.htm". However, there is no explicit statement or link indicating that the authors have released their own source code for the methodology described in this paper (KSD).
Open Datasets	No	The paper describes generating its own data for experiments: "We draw i.i.d. sample {xi}n i=1 from p(x) = P5 k=1 wk N(x ; µk, σ2) with wk = 1/5, σ = 1 and µk randomly drawn from Uniform[0, 10]. We then generate q(x) by adding Gaussian noise on µk, log wk, or log σ2, leading to three different ways for perturbation..." and "In our experiment, we simulate a true model p(x) by drawing b and c from standard Gaussian and select B uniformly randomly from { 1}; we use d = 50 observable variables and d = 10 hidden variables...". It does not mention using or providing access to any publicly available dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. It describes synthetic data generation and general sample sizes, but not how these samples are split into standard training, validation, and test sets for model evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It only mentions using "mmd Test Boot.m" for MMD, but without version information for the software itself.
Experiment Setup	Yes	All the methods are evaluated with a signiﬁcance level of 0.05. The KSD-U method uses an RBF kernel with bandwidth chosen to be the median of the data distances, and a bootstrap size of 1000. For MMD-MCMC, 1000 burn-in steps are used. For the Gaussian-Bernoulli RBM experiments, d = 50 observable variables and d = 10 hidden variables are used.