Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals
Authors: Xinghe Fu, Zhiyuan Yan, Zheng Yang, Taiping Yao, Yandan Zhao, Shouhong Ding, Xi Li
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiment results are striking and highly surprising: Pi D achieves 98% accuracy on the widely used Gen Image benchmark, highlighting the effectiveness and generalization performance. We conduct extensive experiments on existing widely used benchmarks and demonstrate the surprisingly high generalization performance over other SOTAs. |
| Researcher Affiliation | Collaboration | 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2Youtu Lab, Tencent, Shanghai, China. Correspondence to: Xi Li <EMAIL>, Taiping Yao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Pi D Training Pipeline |
| Open Source Code | No | The paper does not explicitly state that the code is publicly available, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Training datasets. We consider different training settings following Foren Synths (Wang et al., 2020) and Gen Image (Zhu et al., 2024)... Test datasets. To evaluate the generalization performance of different approaches in real-world scenarios, we test the models on 3 widely used datasets with 26 generative models. Universal Fake Detect dateset (Ojha et al., 2023). Gen Image dataset (Zhu et al., 2024). Self-Synthesis GAN dataset (Tan et al., 2024c). |
| Dataset Splits | No | The paper mentions 'training set' and 'test set' for various datasets (e.g., 'All the detection models are trained on the training set of Foren Synths', 'we test the models on 3 widely used datasets'), but it does not specify the exact percentages or sample counts for these splits within the paper. It refers to existing benchmarks for the datasets. |
| Hardware Specification | No | The paper mentions 'We compare different models and report the parameters numbers and inference time of each model on the same device' in Table 7, but it does not specify the exact model or type of hardware (e.g., GPU, CPU) used for running the experiments. |
| Software Dependencies | No | The paper states 'We implement the detector network with a simple customized Res Net architecture' and mentions using an 'SGD optimizer', but it does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | We train the detector network for 50 epochs with batch size 64. The network is optimized with an SGD optimizer with a learning rate of 0.001. |