Fast Two-Sample Testing with Analytic Representations of Probability Measures
Authors: Kacper P. Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur Gretton
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. |
| Researcher Affiliation | Academia | Kacper Chwialkowski Gatsby Computational Neuroscience Unit, UCL Aaditya Ramdas Dept. of EECS and Statistics, UC Berkeley Dino Sejdinovic Dept of Statistics, University of Oxford Arthur Gretton Gatsby Computational Neuroscience Unit, UCL |
| Pseudocode | No | The paper describes algorithms and tests using prose and mathematical equations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Real Data 1: Higgs dataset, D = 4, n varies, J = 10. The first experiment we consider is on the UCI Higgs dataset [18] described in [3]... Real Data 2: Amplitude Modulated Music, D = 1000, n = 10000, J = 10. [...] further details of these data are described in [11, Section 5]. |
| Dataset Splits | Yes | For all tests, the value of the scaling parameter γ was chosen so as to minimize a p-value estimate on a held-out training set: details are described in Appendix D. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Experimental setup. For all the experiments, D is the dimensionality of samples in a dataset, n is a number of samples in the dataset (sample size) and J is number of test frequencies. Parameter selection is required for all the tests. The table summarizes the main choices of the parameters made for the experiments. The scalar γ represents the length-scale of the observed data. For all tests, the value of the scaling parameter γ was chosen so as to minimize a p-value estimate on a held-out training set: details are described in Appendix D. |