Breaking Semantic Artifacts for Generalized AI-generated Image Detection
Authors: Chende Zheng, Chenhao Lin, Zhengyu Zhao, Hang Wang, Xu Guo, Shuai Liu, Chao Shen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive openworld evaluation on 31 test sets, covering 7 Generative Adversarial Networks, 18 (variants of) Diffusion Models, and another 6 CNN-based generative models. The results demonstrate that our approach outperforms previous approaches by 2.08% (absolute) on average regarding cross-scene detection accuracy. We also notice the superiority of our approach in open-world generalization, with an average accuracy improvement of 10.59% (absolute) across all test sets. |
| Researcher Affiliation | Academia | Chende Zheng1 Chenhao Lin1 Zhengyu Zhao1 Hang Wang2 Xu Guo3 Shuai Liu3 Chao Shen1 1School of Cyber Science and Engineering, Xi an Jiaotong University 2School of Automation Science and Engineering, Xi an Jiaotong University 3School of Software Engineering, Xi an Jiaotong University zhengchende@stu.xjtu.edu.cn {linchenhao, zhengyu.zhao, cshangwang}@xjtu.edu.cn xuguo@stu.xjtu.edu.cn {sh_liu, chaoshen}@mail.xjtu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Zig-HS/Fake Image Detection. |
| Open Datasets | Yes | We employ the generated images from Diffusion DB [32]. The training set consists of 48,000 images generated by Stable Diffusion v1.4 using prompts from Internet Users and 48,000 real images from LAION-5B [33]. |
| Dataset Splits | Yes | We separate 5,000 images from the training set to serve as the validation set. |
| Hardware Specification | Yes | Our approach is implemented using the Py Torch on NVIDIA A100 40GB Tensor Core GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | During the training, we perform zero padding on the images to ensure the shorter edge is 256 pixels (resizing is also used as another pre-processing pipeline for ablation), and then randomly crop the images to 256x256. Each image will be horizontally or vertically flipped with a probability of 50% as data augmentation. The number of same-convolution blocks N is set to 18 and the size of each patch P is set to 32. The detector is trained using the Adam optimizer and early-stop strategy with an initial learning rate of 1e 4, a minimum learning rate of 1e 6, and a batch size of 64. |