Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Generalizable Detector for Generated Image

Authors: Qianshu Cai, Chao Wu, Yonggang Zhang, Jun Yu, Xinmei Tian

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method exhibits superior generalizability compared to training-based methods [69, 68, 64], and outperforms the state-of-the-art (SOTA) training-free methods. Our main contributions can be summarized as follows: Comprehensive experimental results demonstrate that our DEn D framework not only surpasses the SOTA training-free method but also outperforms most training-based detectors. In this section, we conduct a series of experiments to evaluate generated image detectors within practical scenarios that involve unknown generative models. The experimental results demonstrate that our approach holds significant advantages.
Researcher Affiliation	Academia	Qianshu Cai1 Chao Wu2,3 Yonggang Zhang4 Jun Yu5 Xinmei Tian1 1Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China 2Zhejiang University 3School of Artificial Intelligence, Hebei Institute of Communications 4The Hong Kong University of Science and Technology 5The School of Intelligence Science and Engineering, Harbin Institute of Technology, Shenzhen
Pseudocode	No	The paper describes the methodology and framework using mathematical formulations and descriptive text, but it does not contain an explicitly labeled 'Pseudocode' block or 'Algorithm' section with structured steps.
Open Source Code	Yes	Code is available at https://github.com/dav-joy-thon/DEn D-Detection.
Open Datasets	Yes	We evaluate the performance of generated image detectors on two commonly used datasets: Image Net [9] and LSUN-BEDROOM [74]. For Image Net, the generated images are generated ... For LSUN-BEDROOM, generated images are generated ... To demonstrate the superiority of our method in more realistic scenarios involving unknown generative models, we evaluate the detectors on two general and comprehensive benchmarks : Gen Image [79] and AIGCDetect Benchmark [77]. ... To demonstrate the generalizability of our method on unavailable generative models, we also evaluate detectors on Sora [47]. ... We randomly select 5,000 images from LAION [57] as natural images.
Dataset Splits	Yes	To find an optimal threshold, we randomly separated 2,000 natural and generated images as a validation set (with no overlap with the test set) and used an algorithm to identify the optimal threshold in the validation set. This threshold was then applied to calculate the accuracy on the test set.
Hardware Specification	Yes	All experiments were conducted on a single NVIDIA Ge Force RTX 4090 GPU with 24 GB memory.
Software Dependencies	No	The paper mentions employing the pre-trained self-supervised model DINOv2 Vi T-L/14, but does not list specific version numbers for general software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We adopted the DINOv2 Vi T-L/14 model, recognized for its optimal balance between speed and performance. We set the batch size N = 128 and temperature coefficient τ = 0.6, which show the best performance (see Appendix I.1). Regarding the selection of m(x), we employ Gaussian noise with a mean of 0 and a variance of 0.04 (see Appendix H).