Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals

Authors: Xinghe Fu, Zhiyuan Yan, Zheng Yang, Taiping Yao, Yandan Zhao, Shouhong Ding, Xi Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiment results are striking and highly surprising: Pi D achieves 98% accuracy on the widely used Gen Image benchmark, highlighting the effectiveness and generalization performance. We conduct extensive experiments on existing widely used benchmarks and demonstrate the surprisingly high generalization performance over other SOTAs.
Researcher Affiliation	Collaboration	1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2Youtu Lab, Tencent, Shanghai, China. Correspondence to: Xi Li <EMAIL>, Taiping Yao <EMAIL>.
Pseudocode	Yes	Algorithm 1 Pi D Training Pipeline
Open Source Code	No	The paper does not explicitly state that the code is publicly available, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	Training datasets. We consider different training settings following Foren Synths (Wang et al., 2020) and Gen Image (Zhu et al., 2024)... Test datasets. To evaluate the generalization performance of different approaches in real-world scenarios, we test the models on 3 widely used datasets with 26 generative models. Universal Fake Detect dateset (Ojha et al., 2023). Gen Image dataset (Zhu et al., 2024). Self-Synthesis GAN dataset (Tan et al., 2024c).
Dataset Splits	No	The paper mentions 'training set' and 'test set' for various datasets (e.g., 'All the detection models are trained on the training set of Foren Synths', 'we test the models on 3 widely used datasets'), but it does not specify the exact percentages or sample counts for these splits within the paper. It refers to existing benchmarks for the datasets.
Hardware Specification	No	The paper mentions 'We compare different models and report the parameters numbers and inference time of each model on the same device' in Table 7, but it does not specify the exact model or type of hardware (e.g., GPU, CPU) used for running the experiments.
Software Dependencies	No	The paper states 'We implement the detector network with a simple customized Res Net architecture' and mentions using an 'SGD optimizer', but it does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup	Yes	We train the detector network for 50 epochs with batch size 64. The network is optimized with an SGD optimizer with a learning rate of 0.001.