reproducibilityindex.ai

Efficient Aggregated Kernel Tests using Incomplete $U$-statistics

Authors: Antonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur Gretton

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support our claims with numerical experiments on the trade-off between computational efﬁciency and test power. In all three testing frameworks, the linear-time versions of our proposed tests perform at least as well as the current linear-time state-of-the-art tests. 8 Experiments For the two-sample problem, we consider testing samples drawn from a uniform density on [0, 1]d against samples drawn from a perturbed uniform density. ... Similar trends are observed across all our experiments in Figure 1, for the three testing frameworks, when varying the sample size, the dimension, and the difﬁculty of the problem (scale of perturbations or noise level).
Researcher Affiliation	Academia	Antonin Schrab Centre for Artiﬁcial Intelligence Gatsby Computational Neuroscience Unit University College London & Inria London a.schrab@ucl.ac.uk Ilmun Kim Department of Statistics & Data Science Department of Applied Statistics Yonsei University ilmun@yonsei.ac.kr Benjamin Guedj Centre for Artiﬁcial Intelligence University College London & Inria London b.guedj@ucl.ac.uk Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com
Pseudocode	No	The paper describes computational procedures and statistical estimators using mathematical equations and textual descriptions, but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our implementation of the tests and code for reproducibility of the experiments are available online under the MIT license: https://github.com/antoninschrab/agginc-paper.
Open Datasets	Yes	For the two-sample problem, we consider testing samples drawn from a uniform density on [0, 1]d against samples drawn from a perturbed uniform density. ... For the goodness-of-ﬁt problem, we use a Gaussian Bernoulli Restricted Boltzmann Machine as ﬁrst considered by Liu et al. (2016) in this testing framework. ... we present experiments on the MNIST dataset (same trends are observed)
Dataset Splits	No	The paper mentions 'data splits' in the checklist section '3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix C.', but the main text of the paper does not explicitly provide percentages or sample counts for training, validation, and test splits.
Hardware Specification	Yes	The total compute was 500 GPU hours (Nvidia A100 GPUs) on an internal cluster.
Software Dependencies	Yes	The code is written in Python 3.9 and uses the following libraries: NumPy 1.22.4, SciPy 1.8.0, PyTorch 1.11.0, Matplotlib 3.5.1, and Scikit-learn 1.0.2.
Experiment Setup	Yes	We use collections of 21 bandwidths for MMD and HSIC and of 25 bandwidth pairs for HSIC; more details on the experiments (e.g. model and test parameters) are presented in Appendix C. We consider our incomplete aggregated tests MMDAgg Inc, HSICAgg Inc and KSDAgg Inc, with parameter R 2 {1, . . . , N 1} which ﬁxes the deterministic design to consist of the ﬁrst R subdiagonals of the N N matrix, i.e. D := {(i, i + r) : i = 1, . . . , N r for r = 1, . . . , R} with size \|D\| = RN R(R 1)/2. We run our incomplete tests with R 2 {1, 100, 200} and also the complete test using the full design D = i N 2 . The power results are averaged over 100 repetitions and the runtimes over 20 repetitions.