Region-Aware Temporal Inconsistency Learning for DeepFake Video Detection

Authors: Zhihao Gu, Taiping Yao, Yang Chen, Ran Yi, Shouhong Ding, Lizhuang Ma

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and visualizations on several benchmarks demonstrate the effectiveness of our method against state-of-the-art competitors.
Researcher Affiliation Collaboration 1School of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China 2Tencent Youtu Lab, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China
Pseudocode No The paper describes the proposed methods and architecture using text and diagrams, but does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes To evaluate the proposed method, we perform state-of-the-art comparison under intra-dataset and cross-dataset generalization settings on four popular benchmarks: Face Forensics++ (FF++) [Rossler et al., 2019], Celeb-DF [Li et al., 2020c], DFDC [Dolhansky et al., 2019], Wild Deepfake [Zi et al., 2020].
Dataset Splits No The paper describes training parameters and testing on specific datasets but does not explicitly detail the training/validation/test dataset splits, such as specific percentages or sample counts for a validation set.
Hardware Specification No The paper mentions '8 GPUs' but does not specify the exact GPU models, CPU types, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions the use of ResNet-50 as a backbone and the Adam algorithm, but it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes During training, we sample U = 4 snippets and each snippet contains T = 4 frames. And images are resized to 224 224 as input to the network. The temperature scalar is set as 10 2. We adopt the Adam algorithm to optimize the binary cross-entropy loss and train the network for 45 epochs on 8 GPUs with the initial learning rate of 10 4. The learning rate is divided by 10 for every 15 epochs and batch-size is 12. Random horizontal flip is employed as the data augmentation. During inference, eight 4-length snippets are centrally sampled from each segment.