Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments on four widely used benchmark datasets demonstrate that Delocate not only excels in localizing tampered areas but also enhances cross-domain detection performance.
Researcher Affiliation Academia Juan Hu1 , Xin Liao1 , Difei Gao2 , Satoshi Tsutsui3 , Qian Wang4 , Zheng Qin1 , Mike Zheng Shou2 1College of Computer Science and Electronic Engineering, Hunan University, China 2Show Lab, National University of Singapore, Singapore 3Rapid-Rich Object Search (ROSE) Lab, Nanyang Technological University, Singapore 4School of Cyber Science and Engineering, Wuhan University, China
Pseudocode Yes Algorithm 1: The algorithm process of Delocate.
Open Source Code No The paper does not contain an explicit statement or link indicating that open-source code for the described methodology is provided.
Open Datasets Yes Four public Deepfake video datasets, i.e., FF++ [Rossler et al., 2019], CDF [Li et al., 2020b], DFo [Jiang et al., 2020], DFDC [Dolhansky et al., 2020] are utilized to evaluate the proposed method and existing methods.
Dataset Splits Yes To simulate unknown domain detection during training, the Meta-train phase performs training by sampling many detection tasks, and is validated by sampling many similar detection tasks from the Meta-test. ... we randomly split the training data into Metatrain and Meta-test with 7 : 3.
Hardware Specification No The paper does not provide specific details on the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions optimizers (Adam W, SGD) and tools (FFmpege, dlib) with citations, but does not specify version numbers for programming languages or key software libraries required for reproduction.
Experiment Setup Yes In the Recovering stage, the masking ratio, batch size, patch size, and input size are set as 0.75, 8, 16, 224, respectively. The Adam W [Loshchilov and Hutter, 2017] optimizer with an initial learning rate 1.5 10 4, momentum of 0.9 and a weight decay 0.05 is utilized to train the recovery model. The finetuning of the Recovering stage utilizes the Adam W optimizer with an initial learning rate 1 10 3 to detect videos. The SGD optimizer is used for optimizing the Localization stage with the initial learning rate 0.1, momentum of 0.9, and weight decay of 5 10 4. We use FFmpege [Lei et al., 2013] to extract 30 frames from each video. The dlib [Sharma et al., 2016] is utilized to extract faces and detect 68 facial landmarks.