Can We Leave Deepfake Data Behind in Training Deepfake Detector?

Authors: Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments confirm that our design allows leveraging forgery information from both blendfake and deepfake effectively and comprehensively. 4 Experiments. In Tab. 1, we provide extensive comparison results with existing state-of-the-art (So TA) deepfake detectors based on Deep Fake Bench [52], where all methods are trained on FF++ (HQ) and tested on other datasets. 4.3 Ablation Study.
Researcher Affiliation Collaboration Jikang Cheng1 , Zhiyuan Yan2, Ying Zhang3, Yuhao Luo2, Zhongyuan Wang1 , Chen Li3 1 School of Computer Science, Wuhan University 2 The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen) 3 We Chat, Tencent Inc.
Pseudocode Yes Algorithm 1: Training Pro Det
Open Source Code Yes Code is available at https://github.com/beautyremain/Pro Det.
Open Datasets Yes Face Forensics++ (FF++) [37] is constructed by four forgery methods including Deepfakes (DF) [15], Face2Face (F2F) [44], Face Swap (FS) [18], and Neural Textures (NT) [43]. FF++ with High Quality (HQ) is employed as the training dataset for all experiments in our paper. The base images to generate blendfake images are also from FF++ (HQ) real. For cross-dataset evaluations, we introduce Celeb-DF-v1 (CDFv1) [29], Celeb-DF-v2 (CDFv2) [29], Deep Fake Detection Challenge Preview (DFDCP) [16], and Deep Fake Detection Challenge (DFDC) [16].
Dataset Splits No The paper specifies FF++ (HQ) as the training dataset and other datasets for cross-dataset evaluations (testing), but it does not explicitly provide details about a validation split within these datasets, such as percentages or sample counts.
Hardware Specification Yes All experiments are conducted on two NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions software components such as 'Efficient Net B4 [42]', 'Adam optimizer', and 'Dlib [25]', but it does not provide specific version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, Dlib version).
Experiment Setup Yes The trade-off parameters are set to β = 1 and γ = 10. The Adam optimizer is used with a learning rate of 0.0002, epoch of 20, input size of 256 256, and batch size of 24. Feature Bridging is deployed after a warm-up phase of two epochs.