A Wild Bootstrap for Degenerate Kernel Tests
Authors: Kacper P Chwialkowski, Dino Sejdinovic, Arthur Gretton
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler. Our tests outperform both the naive approach which neglects the dependence structure within the samples, and the approach of [4], when testing across multiple lags. |
| Researcher Affiliation | Academia | Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WC1E 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby Computational Neuroscience Unit, UCL 17 Queen Square, London WC1N 3AR dino.sejdinovic@gmail.com Arthur Gretton Gatsby Computational Neuroscience Unit, UCL 17 Queen Square, London WC1N 3AR arthur.gretton@gmail.com |
| Pseudocode | No | The paper describes methods mathematically but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/kacper Chwialkowski/ wild Bootstrap. |
| Open Datasets | No | The paper synthesises or defines the data generation processes (e.g., 'synthesise the sounds', 'Extinct Gaussian autoregressive process', 'process sampled according to the dynamics proposed by [4]') rather than using pre-existing public datasets with explicit access information. |
| Dataset Splits | No | The paper discusses sample sizes for experiments (e.g., 'sample size=500', 'sample sizes are (nx, ny) = {(300, 200), (600, 400), (900, 600)}', 'n is the sample size') but does not specify explicit train/validation/test dataset splits as typically found in machine learning model development. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library version numbers required for replication. |
| Experiment Setup | Yes | MCMC: sample size=500; a Gaussian kernel with bandwidth σ = 1.7 is used; every second Gibbs sample is kept (i.e., after a pass through both dimensions). Audio: sample sizes are (nx, ny) = {(300, 200), (600, 400), (900, 600)}; a Gaussian kernel with bandwidth σ = 14 is used. Both: wild bootstrap uses blocksize of ln = 20; averaged over at least 200 trials. In lag-HSIC, the number of lags under examination was equal to max{10, log n}, where n is the sample size. We used Gaussian kernels with widths estimated by the median heuristic. The cumulative distribution of the V -statistics was approximated by samples from n Vb2. To model the tail of this distribution, we have fitted the generalized Pareto distribution to the bootstrapped samples. |