Goodness-of-Fit Testing for Discrete Distributions via Stein Discrepancy

Authors: Jiasen Yang, Qiang Liu, Vinayak Rao, Jennifer Neville

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply the proposed goodness-of-fit test to three statistical models involving discrete distributions, and our experiments show that the proposed test typically outperforms a two-sample test based on the maximum mean discrepancy.
Researcher Affiliation Academia 1Department of Statistics, Purdue University, West Lafayette, IN 2Department of Computer Science, The University of Texas at Austin, Austin, TX 3Department of Computer Science, Purdue University, West Lafayette, IN.
Pseudocode Yes Algorithm 1 Goodness-of-fit testing via KDSD
Open Source Code No The paper does not explicitly state that its source code for the methodology is released or provide a link to it.
Open Datasets No The paper describes generating samples from models (Ising, Bernoulli RBM, ERGM) and mentions using 'ergm R package (Handcock et al., 2017)' for ERGM, but does not specify a publicly available or open dataset that is used for training.
Dataset Splits No The paper describes drawing samples for hypothesis testing (n samples from q for KDSD, and n from q and n from p for MMD) but does not specify training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments.
Software Dependencies Yes We utilize the ergm R package (Handcock et al., 2017). R package version 3.8.0.
Experiment Setup Yes We set m = 5000 for both methods throughout. ... significance level α = 0.05. ... We consider a periodic 10-by-10 lattice, with d = 100 random variables. We focus on the ferromagnetic setting and set θij = 1/T, where T is the temperature of the system. ... We use M = 50 visible units and K = 25 hidden units. We draw the entries of the weight matrix W i.i.d. from a Normal distribution with mean zero and standard deviation 1/M, and the entries of the bias terms b and c i.i.d. from the standard Normal distribution. ... We consider an ERGM distribution for undirected graphs on 20 nodes, with the dimension of each sample d = 20 2 = 190. We fix θ1 = 2 and τ = 0.01.