Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces
Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on four widely used benchmark datasets demonstrate that Delocate not only excels in localizing tampered areas but also enhances cross-domain detection performance. |
| Researcher Affiliation | Academia | Juan Hu1 , Xin Liao1 , Difei Gao2 , Satoshi Tsutsui3 , Qian Wang4 , Zheng Qin1 , Mike Zheng Shou2 1College of Computer Science and Electronic Engineering, Hunan University, China 2Show Lab, National University of Singapore, Singapore 3Rapid-Rich Object Search (ROSE) Lab, Nanyang Technological University, Singapore 4School of Cyber Science and Engineering, Wuhan University, China |
| Pseudocode | Yes | Algorithm 1: The algorithm process of Delocate. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that open-source code for the described methodology is provided. |
| Open Datasets | Yes | Four public Deepfake video datasets, i.e., FF++ [Rossler et al., 2019], CDF [Li et al., 2020b], DFo [Jiang et al., 2020], DFDC [Dolhansky et al., 2020] are utilized to evaluate the proposed method and existing methods. |
| Dataset Splits | Yes | To simulate unknown domain detection during training, the Meta-train phase performs training by sampling many detection tasks, and is validated by sampling many similar detection tasks from the Meta-test. ... we randomly split the training data into Metatrain and Meta-test with 7 : 3. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimizers (Adam W, SGD) and tools (FFmpege, dlib) with citations, but does not specify version numbers for programming languages or key software libraries required for reproduction. |
| Experiment Setup | Yes | In the Recovering stage, the masking ratio, batch size, patch size, and input size are set as 0.75, 8, 16, 224, respectively. The Adam W [Loshchilov and Hutter, 2017] optimizer with an initial learning rate 1.5 10 4, momentum of 0.9 and a weight decay 0.05 is utilized to train the recovery model. The finetuning of the Recovering stage utilizes the Adam W optimizer with an initial learning rate 1 10 3 to detect videos. The SGD optimizer is used for optimizing the Localization stage with the initial learning rate 0.1, momentum of 0.9, and weight decay of 5 10 4. We use FFmpege [Lei et al., 2013] to extract 30 frames from each video. The dlib [Sharma et al., 2016] is utilized to extract faces and detect 68 facial landmarks. |