Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Anchor-based Maximum Discrepancy for Relative Similarity Testing
Authors: Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets. |
| Researcher Affiliation | Academia | Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu The University of Melbourne, Australia EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 The AMD test |
| Open Source Code | Yes | Codes are publicly available at: https://github.com/tmlr-group/AMD. |
| Open Datasets | Yes | Following the setups in [13, 14], we adapt MNIST and CIFAR10 as benchmark datasets, both of which comprise original and generative images. [...] Specifically, we randomly draw sample from HC3 dataset, |
| Dataset Splits | Yes | In the testing procedure, we use testing samples Z' = {z'i}m i=1 Um, X' = {x'i}m i=1 Pm and Y' = {y'i}m i=1 Qm, which are drawn independently of Z, X, and Y. This follows the data-splitting strategy commonly used in kernel-based hypothesis testing to ensure the validity of the test [21]. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We set the sample size to 50 for CIFAR10 and 160 for MNIST. We denote by P the original images and Q the generative images, and set the U as U = νP + (1 − ν)Q with ν ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. [...] Figure 3 illustrates the impact of the regularization parameter λ on optimization with data augmentation. The results indicate that λ can be selected within a relatively broad range, specifically [10^−8, 10^3] for CIFAR10 and [10^−8, 10^1] for MNIST. [...] Algorithm 1 The AMD test Input: Training Samples Z, X and Y, Iteration Epochs T for training, Testing Samples Z', X' and Y', Iteration Epochs B for testing |