Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Anchor-based Maximum Discrepancy for Relative Similarity Testing

Authors: Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets.
Researcher Affiliation Academia Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu The University of Melbourne, Australia EMAIL EMAIL
Pseudocode Yes Algorithm 1 The AMD test
Open Source Code Yes Codes are publicly available at: https://github.com/tmlr-group/AMD.
Open Datasets Yes Following the setups in [13, 14], we adapt MNIST and CIFAR10 as benchmark datasets, both of which comprise original and generative images. [...] Specifically, we randomly draw sample from HC3 dataset,
Dataset Splits Yes In the testing procedure, we use testing samples Z' = {z'i}m i=1 Um, X' = {x'i}m i=1 Pm and Y' = {y'i}m i=1 Qm, which are drawn independently of Z, X, and Y. This follows the data-splitting strategy commonly used in kernel-based hypothesis testing to ensure the validity of the test [21].
Hardware Specification No The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes We set the sample size to 50 for CIFAR10 and 160 for MNIST. We denote by P the original images and Q the generative images, and set the U as U = νP + (1 − ν)Q with ν ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. [...] Figure 3 illustrates the impact of the regularization parameter λ on optimization with data augmentation. The results indicate that λ can be selected within a relatively broad range, specifically [10^−8, 10^3] for CIFAR10 and [10^−8, 10^1] for MNIST. [...] Algorithm 1 The AMD test Input: Training Samples Z, X and Y, Iteration Epochs T for training, Testing Samples Z', X' and Y', Iteration Epochs B for testing