A Kernelized Stein Discrepancy for Goodness-of-fit Tests

Authors: Qiang Liu, Jason Lee, Michael Jordan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical results in this section. We start with a toy case of 1D Gaussian mixture on which we can compare with the classical goodness-of-fit tests that only work for univariate distributions, and then proceed to Gaussian Bernoulli restricted Boltzmann machine (RBM), a graphical model widely used in deep learning (Welling et al., 2004; Hinton & Salakhutdinov, 2006). The following methods are evaluated, all with a significance level of 0.05:
Researcher Affiliation Academia Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 03755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer Science University of California, Berkeley, CA 94709
Pseudocode Yes Algorithm 1 Bootstrap Goodness-of-fit Test based on KSD Input: Sample {xi} and score function sq(x) = x log q(x). Bootstrap sample size m. Test: H0: {xi} is drawn from q v.s H1: {xi} is not drawn from q. 1. Compute ˆSu by (14) and uq(x, x ) as defined in Theorem 3.6. Generate m bootstrap sample ˆS u by (16). 2. Reject H0 with significance level α if the percentage of ˆS u that satisfies ˆS u > ˆSu is less than α.
Open Source Code No The paper mentions using a third-party resource for MMD: "We use the mmd Test Boot.m under http://www. gatsby.ucl.ac.uk/%7Egretton/mmd/mmd.htm". However, there is no explicit statement or link indicating that the authors have released their own source code for the methodology described in this paper (KSD).
Open Datasets No The paper describes generating its own data for experiments: "We draw i.i.d. sample {xi}n i=1 from p(x) = P5 k=1 wk N(x ; µk, σ2) with wk = 1/5, σ = 1 and µk randomly drawn from Uniform[0, 10]. We then generate q(x) by adding Gaussian noise on µk, log wk, or log σ2, leading to three different ways for perturbation..." and "In our experiment, we simulate a true model p(x) by drawing b and c from standard Gaussian and select B uniformly randomly from { 1}; we use d = 50 observable variables and d = 10 hidden variables...". It does not mention using or providing access to any publicly available dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. It describes synthetic data generation and general sample sizes, but not how these samples are split into standard training, validation, and test sets for model evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers. It only mentions using "mmd Test Boot.m" for MMD, but without version information for the software itself.
Experiment Setup Yes All the methods are evaluated with a significance level of 0.05. The KSD-U method uses an RBF kernel with bandwidth chosen to be the median of the data distances, and a bootstrap size of 1000. For MMD-MCMC, 1000 burn-in steps are used. For the Gaussian-Bernoulli RBM experiments, d = 50 observable variables and d = 10 hidden variables are used.