Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Kernel conditional tests from learning-theoretic bounds

Authors: Pierre-François Massiani, Christian Fiedler, Lukas Haverbeck, Friedrich Solowjow, Sebastian Trimpe

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Numerical results We begin with an example illustrating the different components. We then evaluate performance (type I and II errors) in controlled settings compared to the baseline of Hu and Lei [5]. Next, we illustrate benefits of our test and its pointwise answers compared to global conditional tests, thanks to the covariate rejection region and the fact that we obtain lower type II error when the tested functions differ only in rarely-sampled covariate regions. Finally, we showcase an application on change detection for a linear dynamical system. We remind the reader that Appendices A and B contain further numerical studies respectively comparing more general functionals (such as the two-sample one) and investigating our bootstrapping schemes.
Researcher Affiliation	Academia	1 Institute for Data Science in Mechanical Engineering, RWTH Aachen University 2 Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, and Munich Center for Machine Learning (MCML) EMAIL EMAIL
Pseudocode	Yes	C Summary of algorithms Algorithm 1 summarizes our implementation of the test of Theorem 4.4. The bootstrapping of the multiplicative constants βi, i {1, 2} it relies on is detailed in Appendix B, and summarized in Algorithms Algorithms 2 and 3. We emphasize that, in principle, any other bootstrapping algorithm for KRR can be used. All routines assume the form K = k idκ (c.f Appendix A). Algorithm 4 shows how to compute the test statistic, called the conditional maximum mean discrepancy (CMMD).
Open Source Code	Yes	Additionally, the code to reproduce all experimental results is available at https://github.com/ Data-Science-in-Mechanical-Engineering/conditional-test.
Open Datasets	No	G.2.1 General setup Data generation We evaluate our test by generating two data sets, D1 and D2, each containing n N transition pairs of the form Di = {(x(i) j , fi(x(i) j ) + ϵ(i) j )}n j=1, i {1, 2}.
Dataset Splits	No	To compute the positive rate in practice, we draw T independent data set pairs {(D(j) 1 , D(j) 2 )}T j=1. For each pair, we take the observed covariates S(j) = {x X \| z Z, (x, z) D(j) 1 D(j) 2 } (34) as the region of interest and compute the covariate rejection region ˆχ(D(j) 1 , D(j) 2 ) := χ(D(j) 1 , D(j) 2 ) S(j) = {x S \| T (x, D1 D2) = 1}.
Hardware Specification	Yes	All experiments in this section were conducted on an Intel Xeon 8468 Sapphire CPU, using 10 GB of RAM.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	G Experiment details This section presents the detailed setups for each of our numerical experiments. Additionally, the code to reproduce all experimental results is available at https://github.com/ Data-Science-in-Mechanical-Engineering/conditional-test. Unless stated, all our experiments with bootstrapped test thresholds use the naive resampling scheme outlined in Appendix B. Table 2: Hyperparameters used in generating Figure 1. Parameter Value Input set X = [ 1, 1] Input kernel bandwidth γ2 = 0.25 Data set size n = 25 Noise variance s2 = 0.052 Regularization λ = 0.01 Bootstrap resamples M = 1000