reproducibilityindex.ai

KSD Aggregated Goodness-of-fit Test

Authors: Antonin Schrab, Benjamin Guedj, Arthur Gretton

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁnd on both synthetic and real-world data that KSDAGG outperforms other state-of-the-art quadratic-time adaptive KSD-based goodness-of-ﬁt testing procedures. We discuss the implementation of KSDAGG and experimentally validate our proposed approach on benchmark problems, not only on datasets classically used in the literature but also on original data obtained using state-of-the-art generative models (i.e. Normalizing Flows).
Researcher Affiliation	Academia	Antonin Schrab Centre for Artiﬁcial Intelligence Gatsby Computational Neuroscience Unit University College London & Inria London a.schrab@ucl.ac.uk Benjamin Guedj Centre for Artiﬁcial Intelligence University College London & Inria London b.guedj@ucl.ac.uk Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com
Pseudocode	Yes	Algorithm 1 KSDAGG
Open Source Code	Yes	Contributing to the real-world applications of these goodness-of-ﬁt tests, we provide publicly available code to allow practitioners to employ our method: https://github.com/antoninschrab/ksdagg-paper.
Open Datasets	Yes	MNIST dataset (Le Cun et al., 1998, 2010)
Dataset Splits	No	The paper uses various datasets (Gamma, GBRBM, MNIST Normalizing Flow) but does not explicitly provide details about train/validation/test splits for its experiments. For MNIST, it mentions a pre-trained model but not the experimental splits for the KSDAGG tests.
Hardware Specification	Yes	All experiments have been run on an AMD Ryzen Threadripper 3960X 24 Cores 128Gb RAM CPU at 3.8GHz
Software Dependencies	No	The paper mentions using third-party implementations ('Jitkrittum et al. (2017)' and 'Phillip Lippe’s implementation') but does not specify any software dependencies with version numbers for its own code or key libraries.
Experiment Setup	Yes	All our experiments are run with level = 0.05 using the IMQ kernel deﬁned in Equation (7) with parameter βk = 0.5. We use a parametric bootstrap with B1 = B2 = 500 bootstrapped KSD values to compute the adjusted test thresholds, and B3 = 50 steps of bisection method to estimate the correction u in Equation (6).