Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Permutation-Free Kernel Independence Test

Authors: Shubhanshu Shekhar, Ilmun Kim, Aaditya Ramdas

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we experimentally validate the theoretical results presented in the previous sections. The code for reproducing these results is available in the repository: https: //github.com/sshekhar17/Perm Free HSIC. Numerical simulations demonstrate that compared to the permutation tests, our variants have the same power within a constant factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.
Researcher Affiliation	Academia	Shubhanshu Shekhar EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA Ilmun Kim EMAIL Department of Statistics and Data Science Department of Applied Statistics Yonsei University Seodaemun-gu, Seoul, 03722, Republic of Korea Aaditya Ramdas EMAIL Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA
Pseudocode	No	The paper describes mathematical derivations and methodological steps for the cross-HSIC test, but it does not include a distinct block explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	In this section, we experimentally validate the theoretical results presented in the previous sections. The code for reproducing these results is available in the repository: https: //github.com/sshekhar17/Perm Free HSIC.
Open Datasets	No	In this section, we experimentally validate the theoretical results presented in the previous sections. The code for reproducing these results is available in the repository: https: //github.com/sshekhar17/Perm Free HSIC. In the ﬁrst experiment, we verify the claim of Theorem 6 that states that ﬁnite second moment of the kernel is suﬃcient for the asymptotic normality of the x HSICn statistic under the null. In particular, we use linear kernels k and ℓ, and consider the case where PX and PY are distribution in Rd; with each component drawn independently from a t-distribution with dof degrees of freedom. The observations are drawn from independent multivariate Gaussian distributions with unit covariance matrix. The results, plotted in Figure 3, show that as expected, the null distribution of x HSICn approaches the standard normal distribution, even for relatively small sample sizes (all the plots have n = 200).
Dataset Splits	No	The paper primarily discusses the splitting of data (D2n) into two equal parts (Dn and D2n n+1) as part of its proposed methodology (sample splitting) for constructing the cross-HSIC statistic, not as a standard training/validation/test split for evaluating models on a specific dataset. Experiments are performed on synthetic data with specified parameters (e.g., distributions like t-distribution or multivariate Gaussian), which does not typically involve explicit train/test/validation splits.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU models, CPU types, or cloud computing resources.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation or experimentation.
Experiment Setup	Yes	For this experiment, we set PX to a multivariate Gaussian distribution in d dimensions with identity covariance matrix; and then generated Y as Y = ϵ Xb + (1 ϵ) (X )b, for b > 0. In the above display, the exponentiation is done component-wise and X is an independent copy of X. As shown in Figure 5, our cross-HSIC test is slightly less powerful than the HSIC permutation test across diﬀerent scenarios. This power loss can be attributed to a less eﬃcient use of the data due to sample splitting. Furthermore, we deﬁne kn(x, x ) = exp cn x x 2 and ℓn(y, y ) = exp cn y y 2 , where we have overloaded the term to represent the Euclidean norm on both X and Y. In all ﬁgures, we have d = 10 and b = 2, while ϵ is set to 0.3, 0.4, and 0.5 in the three columns.