Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Anchor-based Maximum Discrepancy for Relative Similarity Testing

Authors: Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets.
Researcher Affiliation	Academia	Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu The University of Melbourne, Australia EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 The AMD test
Open Source Code	Yes	Codes are publicly available at: https://github.com/tmlr-group/AMD.
Open Datasets	Yes	Following the setups in [13, 14], we adapt MNIST and CIFAR10 as benchmark datasets, both of which comprise original and generative images. [...] Specifically, we randomly draw sample from HC3 dataset,
Dataset Splits	Yes	In the testing procedure, we use testing samples Z' = {z'i}m i=1 Um, X' = {x'i}m i=1 Pm and Y' = {y'i}m i=1 Qm, which are drawn independently of Z, X, and Y. This follows the data-splitting strategy commonly used in kernel-based hypothesis testing to ensure the validity of the test [21].
Hardware Specification	No	The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	We set the sample size to 50 for CIFAR10 and 160 for MNIST. We denote by P the original images and Q the generative images, and set the U as U = νP + (1 − ν)Q with ν ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. [...] Figure 3 illustrates the impact of the regularization parameter λ on optimization with data augmentation. The results indicate that λ can be selected within a relatively broad range, specifically [10^−8, 10^3] for CIFAR10 and [10^−8, 10^1] for MNIST. [...] Algorithm 1 The AMD test Input: Training Samples Z, X and Y, Iteration Epochs T for training, Testing Samples Z', X' and Y', Iteration Epochs B for testing