SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection
Authors: Yachao Liang, Min Yu, Gang Li, Jianguo Jiang, Boquan Li, Feng Yu, Ning Zhang, Xiang Meng, Weiqing Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of cross-dataset generalization and robustness, without the participation of any fake video in model training. |
| Researcher Affiliation | Academia | 1Institute of Information Engineering, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences 3Deakin University 4Harbin Engineering University 5Institute of Computing Technology, Chinese Academy of Sciences 6Institute of Forensic Science, Ministry of Public Security |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available here. ... We have released the source code and execution steps in the supplementary material and an anonymous repository. And we will open the repository once our paper gets published. |
| Open Datasets | Yes | The model is trained on LRS3 [7] and Vox Celeb2 [16] datasets, which contain 433 and 1326 hours of videos respectively. |
| Dataset Splits | No | The paper describes the datasets used for training (LRS3 and Vox Celeb2) and for testing/evaluation (FF++, Fake AVCeleb, Ko DF), but it does not provide specific details on validation dataset splits (e.g., percentages or sample counts) for the training process or hyperparameter tuning within this paper's experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. The authors explicitly state in the NeurIPS checklist that they do not report this information. |
| Software Dependencies | No | The paper mentions software like FFmpeg [61] and Whisper [54] but does not provide specific version numbers for these or other key software components (e.g., deep learning frameworks or libraries). |
| Experiment Setup | No | The paper states that the training process adheres to the methodology outlined by [56] and that a publicly available pretrained model is utilized. It does not explicitly provide hyperparameters or system-level training settings (like learning rate, batch size, optimizer settings) for its own experiments in the main text. |