Efficient Aggregated Kernel Tests using Incomplete $U$-statistics
Authors: Antonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur Gretton
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our claims with numerical experiments on the trade-off between computational efficiency and test power. In all three testing frameworks, the linear-time versions of our proposed tests perform at least as well as the current linear-time state-of-the-art tests. 8 Experiments For the two-sample problem, we consider testing samples drawn from a uniform density on [0, 1]d against samples drawn from a perturbed uniform density. ... Similar trends are observed across all our experiments in Figure 1, for the three testing frameworks, when varying the sample size, the dimension, and the difficulty of the problem (scale of perturbations or noise level). |
| Researcher Affiliation | Academia | Antonin Schrab Centre for Artificial Intelligence Gatsby Computational Neuroscience Unit University College London & Inria London a.schrab@ucl.ac.uk Ilmun Kim Department of Statistics & Data Science Department of Applied Statistics Yonsei University ilmun@yonsei.ac.kr Benjamin Guedj Centre for Artificial Intelligence University College London & Inria London b.guedj@ucl.ac.uk Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com |
| Pseudocode | No | The paper describes computational procedures and statistical estimators using mathematical equations and textual descriptions, but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our implementation of the tests and code for reproducibility of the experiments are available online under the MIT license: https://github.com/antoninschrab/agginc-paper. |
| Open Datasets | Yes | For the two-sample problem, we consider testing samples drawn from a uniform density on [0, 1]d against samples drawn from a perturbed uniform density. ... For the goodness-of-fit problem, we use a Gaussian Bernoulli Restricted Boltzmann Machine as first considered by Liu et al. (2016) in this testing framework. ... we present experiments on the MNIST dataset (same trends are observed) |
| Dataset Splits | No | The paper mentions 'data splits' in the checklist section '3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix C.', but the main text of the paper does not explicitly provide percentages or sample counts for training, validation, and test splits. |
| Hardware Specification | Yes | The total compute was 500 GPU hours (Nvidia A100 GPUs) on an internal cluster. |
| Software Dependencies | Yes | The code is written in Python 3.9 and uses the following libraries: NumPy 1.22.4, SciPy 1.8.0, PyTorch 1.11.0, Matplotlib 3.5.1, and Scikit-learn 1.0.2. |
| Experiment Setup | Yes | We use collections of 21 bandwidths for MMD and HSIC and of 25 bandwidth pairs for HSIC; more details on the experiments (e.g. model and test parameters) are presented in Appendix C. We consider our incomplete aggregated tests MMDAgg Inc, HSICAgg Inc and KSDAgg Inc, with parameter R 2 {1, . . . , N 1} which fixes the deterministic design to consist of the first R subdiagonals of the N N matrix, i.e. D := {(i, i + r) : i = 1, . . . , N r for r = 1, . . . , R} with size |D| = RN R(R 1)/2. We run our incomplete tests with R 2 {1, 100, 200} and also the complete test using the full design D = i N 2 . The power results are averaged over 100 repetitions and the runtimes over 20 repetitions. |