Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DUAL: Learning Diverse Kernels for Aggregated Two-sample and Independence Testing

Authors: Zhijian Zhou, Xunye Tian, Liuhua Peng, Chao Lei, Antonin Schrab, Danica J. Sutherland, Feng Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we conducted extensive empirical experiments demonstrating the superior performance of our proposed approach across various benchmarks for both twosample and independence testing.
Researcher Affiliation	Academia	University of Melbourne University of Cambridge UBC & Amii EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes methods and procedures in narrative text, without including a dedicated pseudocode or algorithm block.
Open Source Code	Yes	Code: https://github.com/tmlr-group/MMD-HSIC-DUAL.
Open Datasets	Yes	For two-sample testing, we use three datasets: a frequently used synthetic BLOB dataset [45, 51, 5, 6], the MNIST (versus generative adversarial model DCGAN [60]) dataset [6 8], and the Image Net (versus Image Net V2 [61]) dataset [6, 46]. For independence testing, we consider the Higgs dataset (a high-dimensional physics dataset) [62], MNIST, and CIFAR-10.
Dataset Splits	Yes	We partition the dataset into a training set, Wtr = {wtr,i}n i=1, and a testing set, Wte = {wte,i}n i=1. For notational convenience, we assume both sets contain n elements. [...] All the sample sizes n in the Figure 3 represents the number of samples we use in both the training phase and testing phase. Thus, for baselines without data-splitting (e.g., MMDAgg, HSICAgg, FSIC and MMD-FUSE), we use 2n samples to ensure the fairness that there are same total samples included in the whole testing experiments.
Hardware Specification	Yes	The experiments of the work are conducted on two platforms. One platform is an Nvidia-4090 GPU PC with Pytorch framework. Another platform is a High-performance Computer cluster with several Nvidia-A100 GPUs with Pytorch framework. The memory of two platforms are both 64 GB. The storage of disk of two platforms are both over 4 TB.
Software Dependencies	No	The paper mentions 'Pytorch framework' but does not specify its version number or versions for any other key software libraries.
Experiment Setup	Yes	The learning rate is 5e 4 for BLOB and 5e 5 for MNIST and Image Net. For all the two-sample testing experiments, we conduct each experiment with ten different seeds, and for each seed, we perform the testing data selection and two-sample testing process for 100 times. In total, the results are all averaging over 1,000 repetitions.