Delving into Sequential Patches for Deepfake Detection
Authors: Jiazhi Guan, Hang Zhou, Zhibin Hong, Errui Ding, Jingdong Wang, Chengbin Quan, Youjian Zhao
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on popular datasets validate that our approach effectively spots local forgery cues and achieves state-of-the-art performance. |
| Researcher Affiliation | Collaboration | Jiazhi Guan1,2, Hang Zhou2, Zhibin Hong2, Errui Ding2, Jingdong Wang2, Chengbin Quan1, Youjian Zhao1,3 1Department of Computer Science and Technology, Tsinghua University 2Department of Computer Vision Technology (VIS), Baidu Inc. 3Zhongguancun Laboratory |
| Pseudocode | No | The paper provides detailed descriptions and mathematical equations for its modules (LST, CPI, CPA) and a pipeline diagram (Fig. 1), but it does not include a section explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | Our experiments are conducted based on several popular deepfake datasets including Face Forensics++ (FF++) [45], Deep Fake Detection Challenge dataset (DFDC) [15], Celeb DF-V2 (Celeb DF) [35], Face Shift dataset (Face Sh) [30], and Deeper Forensics dataset (Deep Fo) [24]. FF++ (HQ) is used as train set and the remaining four datasets are used for generalization evaluation. |
| Dataset Splits | No | The paper mentions that the learning rate is decayed 'When the performance no longer improve significantly,' implying the use of a validation set, but it does not provide explicit details about the validation dataset split (e.g., percentages or sample counts). |
| Hardware Specification | Yes | Four NVIDIA A100 GPUs are used in our experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam [27]' and 'MTCNN [61]' and refers to specific convolutions from [22], but it does not provide specific version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | The spatial input size H W and patch size P is set to 224 224 and 16, respectively. The embedding dimension D is set to 384. For temporal dimension T, we empirically set it to 16 and provide more discussion in ablations. For optimizer, we use Adam [27] with the initial learning rate of 10 4. |