Efficient Statistical Tests: A Neural Tangent Kernel Approach
Authors: Sheng Jia, Ehsan Nezhadarya, Yuhuai Wu, Jimmy Ba
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main experiments for two-sample tests are carried out on two sets of samples with SP from MNIST (Le Cun et al., 1998) or CIFAR10 (Krizhevsky et al., 2009) datasets and SQ as used in two prior works (Liu et al., 2020; Rabanser et al., 2019). We compare our method, which we call MMD-SCNTK, against the recently proposed two-sample test methods from these two prior works in their image domains. |
| Researcher Affiliation | Collaboration | 1University of Toronto 2Vector Institute 3LG Electronics. Correspondence to: Sheng Jia <sheng@cs.toronto.edu>. |
| Pseudocode | No | No structured pseudocode or algorithm block was found in the paper. |
| Open Source Code | Yes | 1Source code https://github.com/Sheng-J/scntk |
| Open Datasets | Yes | Our main experiments for two-sample tests are carried out on two sets of samples with SP from MNIST (Le Cun et al., 1998) or CIFAR10 (Krizhevsky et al., 2009) datasets and SQ as used in two prior works (Liu et al., 2020; Rabanser et al., 2019). |
| Dataset Splits | No | The paper discusses how baselines use held-out data for training/fine-tuning or optimizing kernels, but it does not specify a training/validation split for its own proposed SCNTK method beyond mentioning that it can use more samples for testing because it doesn't require a training phase. There are no explicit details on how the data was split for training or validation for their own method's setup. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in their implementation. |
| Experiment Setup | Yes | Network architecture for SCNTK. We aim to be consistent with the network architectures used in previous works. Two main differences are: (1) we replace the relu activations of the first layer with cosine activations; (2) To have our NTK kernel closer to its deterministic form, we use a width of 300, which is wider than 32 and 64 used in those baselines. Further details are provided in the Appendix. Choosing the width: As the SCNTK in Equation 4.6 only converges to the deterministic form in the large width limit, we empirically investigate the trend in the two-sample test performance as we increase the width, which is the number of channels for a convolutional network. Table 1 compares the results of the MNIST experiments when varying the width of the network. We see that the performance is relatively stable with widths of 200, 300, and 500. So we chose the width of 300 for our experiments in Table 2. Convolutions with a stride of 2 are used for all the layers. The width, i.e. the number of channels, is set to 300. More details are provided in the Appendix. For both SCNTK and SRF methods, the default bandwidth 1.0 is used. |