Detecting Deepfake Videos with Temporal Dropout 3DCNN

Authors: Daichi Zhang, Chenyu Li, Fanzhao Lin, Dan Zeng, Shiming Ge

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on popular benchmarks clearly demonstrate the effectiveness and generalization capacity of our approach.
Researcher Affiliation Academia 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100095, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China 3School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
Pseudocode Yes Algorithm 1 Temporal Dropout
Open Source Code No The paper does not provide an explicit statement about releasing the code for their own methodology, nor does it provide a direct link to a code repository for their work. It only links to third-party tools (e.g., FFmpeg, MobileNet) or datasets.
Open Datasets Yes We conduct experiments on three deepfake video datasets: the Celeb-DF(v2) [Li et al., 2020c], DFDC [Dolhansky et al., 2019] and Face Forensics++ [R ossler et al., 2019].
Dataset Splits Yes We divide our training, validation and testing sets by a ratio of 6:2:2. And finally, we obtain 7,528 videos as training set, 2,482 videos as validation set and 2,541 videos as testing set. ... To obtain the training, validation and testing sets, we randomly split the videos in the Face Forensics++ dataset in 6:2:2. Finally we obtain 4,074 videos as training set, 1,269 videos as validation set and 1,363 videos as testing set.
Hardware Specification Yes We implement all the models with Py Torch [Paszke et al., 2019] on NVIDIA TITAN Xp.
Software Dependencies No The paper mentions Py Torch, Adam, FFmpeg, and MobileNet, but does not provide specific version numbers for these software components. For example, it only says "Py Torch [Paszke et al., 2019]" without a version number like 1.9.
Experiment Setup Yes During training, we set the batch size as 16 and the total epoch is 50. The model is trained via Adam [Kingma and Ba, 2015] optimization with the global learning rate set as 10 5 and weight decay set as 10 6. We adopt the cross-entropy as the loss function. The activation function of all layers is Relu function. All 3D convolution layers stride are 1 1 1 and all pooling layers stride are 2 2 2 using the Same padding. For the Temporal Dropout module, we set n = 20, α = 1.25, which means that we sample continuous 20 1.25 = 25 frames and then randomly choose 20 frames from them.