Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes

Authors: Long Ma, Zhiyuan Yan, Jin Xu, Yize Chen, Qinglang Guo, Zhen Bi, Yong Liao, hui lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on seven widely used Deepfake detection datasets (encompassing over 58 distinct forgery methods, spanning facial manipulation categories including identity swapping, expression reenactment, generative face synthesis, and attribute editing) validate the superior efficacy of our method. Detailed ablation experiments and robustness experiments have demonstrated the effectiveness of each component in our method and the robustness of our method to disturbances.
Researcher Affiliation Collaboration 1School of Cyber Science and Technology, University of Science and Technology of China 2School of Electronic and Computer Engineering, Peking University 3School of Data Science, The Chinese University of Hong Kong 4School of Information Science and Technology, University of Science and Technology of China 5Huzhou University 6 Banbu AI Foundation 7China Academy of Electronics and Information Technology
Pseudocode Yes Algorithm 1 Pseudocode for FIA-USA
Open Source Code No The code is currently not open source.
Open Datasets Yes To evaluate the effectiveness of our proposed method, we conducted extensive experiments on seven widely-adopted benchmark datasets spanning both classical facial manipulation paradigms and emerging generative deepfake architectures.a. Traditional Deepfake Datasets: 1. Face Forensics++ (FF++) [59], 2. Deepfake Detection (DFD) [12], 3. Deepfake Detection Challenge (DFDC) [15], 4. preview version of DFDC (DFDCP) [14], and 5. Celeb DF (CDF) [43]. ... b. Generative Deepfake Datasets. 6. Diffusion Facial Forgery (Di FF) [5] ... 7. DF40 [80]
Dataset Splits Yes under storage constraints, our implementation adopts the FF++_c23 (including DFD) with training conducted following the SBI protocol [63], employing exclusively real facial samples from FF++_c23 subset. Although many previous studies have utilized the same dataset for both training and testing, the preprocessing and experimental configurations can differ, making fair comparisons difficult. Therefore, in addition to testing on the raw data of the aforementioned datasets, we also performed generalization assessments on the unified new benchmark for traditional deepfakes, (Deepfake Bench) [82]. For video processing, each input is uniformly sampled into 32 frames during both training and inference phases.
Hardware Specification Yes All experiments were conducted on a single NVIDIA 3090.
Software Dependencies No The paper mentions using Efficient Net B4 as a backbone and SAM optimizer, but does not provide specific version numbers for any software libraries or packages.
Experiment Setup Yes We adopt Efficient Net B4 [66] as the backbone network architecture (...), trained for 50 epochs using the SAM optimizer [21] with a batch size of 12 and initial learning rate of 0.001. For video processing, each input is uniformly sampled into 32 frames during both training and inference phases. Our data augmentation pipeline combines the proposed FIA-USA strategy with conventional techniques including Random Horizontal Flip, Random Cut Out, and Add Gaussian Noise. The loss coefficients λ1, λ2, λ3 are empirically set to 1, 2.5, and 0.25 respectively (We also explored the impact of other variants on the detection results), with the temperature parameter τ fixed at 0.7.