Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

Authors: Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, Li Yuan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted extensive experiments on both deepfake detection and synthetic image detection benchmarks and find that our approach achieves significant superiority over other SOTAs with very little training cost. Compared to existing full-parameters and Lo RA-based tuning methods, we explicitly ensure orthogonality, enabling the higher rank of the whole feature space, effectively minimizing overfitting and enhancing generalization.
Researcher Affiliation	Collaboration	1Peking University Shenzhen Graduate School 2Tencent Youtu Lab 3The Chinese University of Hong Kong, Shenzhen. Correspondence to: Taiping Yao <EMAIL>, Li Yuan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Effort Approach Algorithm
Open Source Code	Yes	Our codes are publicly available at Git Hub.
Open Datasets	Yes	For Protocol-1, we conduct evaluations by training the models on Face Forensics++ (FF++) (Rossler et al., 2019) and testing them on other seven deepfake detection datasets: Celeb-DF-v2 (CDF-v2) (Li et al., 2020b), Deepfake Detection (DFD) (DFD., 2020), Deepfake Detection Challenge (DFDC) (detection challenge., 2020), the preview version of DFDC (DFDCP) (Dolhansky et al., 2019), Deeper Forensics (DFo) (Jiang et al., 2020), Wild Deepfake (WDF) (Zi et al., 2020), and FFIW (Zhou et al., 2021).
Dataset Splits	Yes	For Protocol-1, we conduct evaluations by training the models on Face Forensics++ (FF++) (Rossler et al., 2019) and testing them on other seven deepfake detection datasets... For Protocol-2, we evaluate the models on the latest deepfake dataset DF40 (Yan et al., 2024b), which contains the forgery data generated within the FF++ domain... The evaluation set contains 19 subsets derived from different kinds of generative models, including Pro GAN (Karras et al., 2018), Cycle GAN (Zhu et al., 2017), Big GAN (Brock et al., 2018a)...
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions employing 'Adam (Kingma & Ba, 2014) for optimization' and using 'CLIP Vi T-L/14' as the vision foundation model. It also states using 'codebases of Deepfake Bench (Yan et al., 2023b)'. However, it does not provide specific version numbers for these or other ancillary software components like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We use the fixed learning rate of 2e-4 for training our approach and employ the Adam (Kingma & Ba, 2014) for optimization. We set the batch size to 32 for both training and testing. We also employ several widely used data augmentations, such as Gaussian Blur and Image Compression...