Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Permutation-Free Kernel Independence Test

Authors: Shubhanshu Shekhar, Ilmun Kim, Aaditya Ramdas

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally validate the theoretical results presented in the previous sections. The code for reproducing these results is available in the repository: https: //github.com/sshekhar17/Perm Free HSIC. Numerical simulations demonstrate that compared to the permutation tests, our variants have the same power within a constant factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.
Researcher Affiliation Academia Shubhanshu Shekhar EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA Ilmun Kim EMAIL Department of Statistics and Data Science Department of Applied Statistics Yonsei University Seodaemun-gu, Seoul, 03722, Republic of Korea Aaditya Ramdas EMAIL Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA
Pseudocode No The paper describes mathematical derivations and methodological steps for the cross-HSIC test, but it does not include a distinct block explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes In this section, we experimentally validate the theoretical results presented in the previous sections. The code for reproducing these results is available in the repository: https: //github.com/sshekhar17/Perm Free HSIC.
Open Datasets No In this section, we experimentally validate the theoretical results presented in the previous sections. The code for reproducing these results is available in the repository: https: //github.com/sshekhar17/Perm Free HSIC. In the first experiment, we verify the claim of Theorem 6 that states that finite second moment of the kernel is sufficient for the asymptotic normality of the x HSICn statistic under the null. In particular, we use linear kernels k and ℓ, and consider the case where PX and PY are distribution in Rd; with each component drawn independently from a t-distribution with dof degrees of freedom. The observations are drawn from independent multivariate Gaussian distributions with unit covariance matrix. The results, plotted in Figure 3, show that as expected, the null distribution of x HSICn approaches the standard normal distribution, even for relatively small sample sizes (all the plots have n = 200).
Dataset Splits No The paper primarily discusses the splitting of data (D2n) into two equal parts (Dn and D2n n+1) as part of its proposed methodology (sample splitting) for constructing the cross-HSIC statistic, not as a standard training/validation/test split for evaluating models on a specific dataset. Experiments are performed on synthetic data with specified parameters (e.g., distributions like t-distribution or multivariate Gaussian), which does not typically involve explicit train/test/validation splits.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU models, CPU types, or cloud computing resources.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation or experimentation.
Experiment Setup Yes For this experiment, we set PX to a multivariate Gaussian distribution in d dimensions with identity covariance matrix; and then generated Y as Y = ϵ Xb + (1 ϵ) (X )b, for b > 0. In the above display, the exponentiation is done component-wise and X is an independent copy of X. As shown in Figure 5, our cross-HSIC test is slightly less powerful than the HSIC permutation test across different scenarios. This power loss can be attributed to a less efficient use of the data due to sample splitting. Furthermore, we define kn(x, x ) = exp cn x x 2 and ℓn(y, y ) = exp cn y y 2 , where we have overloaded the term to represent the Euclidean norm on both X and Y. In all figures, we have d = 10 and b = 2, while ϵ is set to 0.3, 0.4, and 0.5 in the three columns.