Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals

Authors: Xinghe Fu, Zhiyuan Yan, Zheng Yang, Taiping Yao, Yandan Zhao, Shouhong Ding, Xi Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiment results are striking and highly surprising: Pi D achieves 98% accuracy on the widely used Gen Image benchmark, highlighting the effectiveness and generalization performance. We conduct extensive experiments on existing widely used benchmarks and demonstrate the surprisingly high generalization performance over other SOTAs.
Researcher Affiliation Collaboration 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2Youtu Lab, Tencent, Shanghai, China. Correspondence to: Xi Li <EMAIL>, Taiping Yao <EMAIL>.
Pseudocode Yes Algorithm 1 Pi D Training Pipeline
Open Source Code No The paper does not explicitly state that the code is publicly available, nor does it provide a link to a code repository for the methodology described.
Open Datasets Yes Training datasets. We consider different training settings following Foren Synths (Wang et al., 2020) and Gen Image (Zhu et al., 2024)... Test datasets. To evaluate the generalization performance of different approaches in real-world scenarios, we test the models on 3 widely used datasets with 26 generative models. Universal Fake Detect dateset (Ojha et al., 2023). Gen Image dataset (Zhu et al., 2024). Self-Synthesis GAN dataset (Tan et al., 2024c).
Dataset Splits No The paper mentions 'training set' and 'test set' for various datasets (e.g., 'All the detection models are trained on the training set of Foren Synths', 'we test the models on 3 widely used datasets'), but it does not specify the exact percentages or sample counts for these splits within the paper. It refers to existing benchmarks for the datasets.
Hardware Specification No The paper mentions 'We compare different models and report the parameters numbers and inference time of each model on the same device' in Table 7, but it does not specify the exact model or type of hardware (e.g., GPU, CPU) used for running the experiments.
Software Dependencies No The paper states 'We implement the detector network with a simple customized Res Net architecture' and mentions using an 'SGD optimizer', but it does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes We train the detector network for 50 epochs with batch size 64. The network is optimized with an SGD optimizer with a learning rate of 0.001.