Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Can We Leave Deepfake Data Behind in Training Deepfake Detector?
Authors: Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments confirm that our design allows leveraging forgery information from both blendfake and deepfake effectively and comprehensively. 4 Experiments. In Tab. 1, we provide extensive comparison results with existing state-of-the-art (So TA) deepfake detectors based on Deep Fake Bench [52], where all methods are trained on FF++ (HQ) and tested on other datasets. 4.3 Ablation Study. |
| Researcher Affiliation | Collaboration | Jikang Cheng1 , Zhiyuan Yan2, Ying Zhang3, Yuhao Luo2, Zhongyuan Wang1 , Chen Li3 1 School of Computer Science, Wuhan University 2 The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen) 3 We Chat, Tencent Inc. |
| Pseudocode | Yes | Algorithm 1: Training Pro Det |
| Open Source Code | Yes | Code is available at https://github.com/beautyremain/Pro Det. |
| Open Datasets | Yes | Face Forensics++ (FF++) [37] is constructed by four forgery methods including Deepfakes (DF) [15], Face2Face (F2F) [44], Face Swap (FS) [18], and Neural Textures (NT) [43]. FF++ with High Quality (HQ) is employed as the training dataset for all experiments in our paper. The base images to generate blendfake images are also from FF++ (HQ) real. For cross-dataset evaluations, we introduce Celeb-DF-v1 (CDFv1) [29], Celeb-DF-v2 (CDFv2) [29], Deep Fake Detection Challenge Preview (DFDCP) [16], and Deep Fake Detection Challenge (DFDC) [16]. |
| Dataset Splits | No | The paper specifies FF++ (HQ) as the training dataset and other datasets for cross-dataset evaluations (testing), but it does not explicitly provide details about a validation split within these datasets, such as percentages or sample counts. |
| Hardware Specification | Yes | All experiments are conducted on two NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software components such as 'Efficient Net B4 [42]', 'Adam optimizer', and 'Dlib [25]', but it does not provide specific version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, Dlib version). |
| Experiment Setup | Yes | The trade-off parameters are set to β = 1 and γ = 10. The Adam optimizer is used with a learning rate of 0.0002, epoch of 20, input size of 256 256, and batch size of 24. Feature Bridging is deployed after a warm-up phase of two epochs. |