Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adversarial Attack and Defense for Non-Parametric Two-Sample Tests

Authors: Xilie Xu, Jingfeng Zhang, Feng Liu, Masashi Sugiyama, Mohan Kankanhalli

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both simulated and real-world datasets validate the adversarial vulnerabilities of non-parametric TSTs and the effectiveness of our proposed defense.
Researcher Affiliation	Academia	1School of Computing, National University of Singapore 2RIKEN Center for Advanced Intelligence Project (AIP) 3School of Mathematics and Statistics, The University of Melbourne 4Graduate School of Frontier Sciences, The University of Tokyo.
Pseudocode	Yes	Algorithm 1 Ensemble Attack (EA) Algorithm 2 Adversarially Learning Deep Kernels Algorithm 3 Testing with kθ on SP and SQ
Open Source Code	Yes	Source code is available at https://github.com/God Xuxilie/Robust-TST.git.
Open Datasets	Yes	We conduct six typical non-parametric TSTs (MMD-D, MMD-G, C2ST-S, C2ST-L, ME and SCF) under EA on ﬁve benchmark datasets Blob (Gretton et al., 2012; Jitkrittum et al., 2016; Sutherland et al., 2017), high-dimensional Gaussian mixture (HDGM) (Liu et al., 2020a), Higgs (Chwialkowski et al., 2015), MNIST (Le Cun et al., 1998; Radford et al., 2015) and CIFAR-10 (Krizhevsky, 2009). ... Higgs dataset can be downloaded from UCI Machine Learning Repository. ... CIFAR10 dataset can be downloaded via Py Torch (Paszke et al., 2019).
Dataset Splits	No	The paper explicitly defines training data (e.g., "For Blob, HDGM and Higgs, we randomly sample a training pair (Str P , Str Q )...") and test data ("we randomly sample 100 new pairs (Ste P , Ste Q ), disjoint from the training data, as the benign test pairs."). However, it does not explicitly mention a separate validation set or a specific validation split used for hyperparameter tuning or model selection during the training process.
Hardware Specification	Yes	We conduct all experiments on Python 3.8 (Py Torch 1.1) with NVIDIA RTX A50000 GPUs.
Software Dependencies	Yes	We conduct all experiments on Python 3.8 (Py Torch 1.1) with NVIDIA RTX A50000 GPUs.
Experiment Setup	Yes	The training settings (e.g., the structure of neural network and the optimizer) follow Liu et al. (2020a) and are illustrated in detail in Appendix E.2. ... We use Adam optimizer (Kingma & Ba, 2015)... We set drop-out rate to zero... We set the number of training samples ntr to 100 for Blob, 3, 000 for HDGM, 5, 000 for Higgs, 500 for MNIST and CIFAR-10. ... For C2ST-S and C2ST-L, we set batchsize to 128 for Blob, HDGM and Higgs, and 100 for MNIST and CIFAR-10. We set the number of training epochs to 9000 nte/batchsize for Blob, 1, 000 for HDGM and Higgs, 2, 000 for MNIST and CIFAR-10. We set learning rate to 0.001 for Blob, HDGM and Higgs, and 0.0002 for MNIST and CIFAR-10.