Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Authors: Xinghe Fu, Zhiyuan Yan, Taiping Yao, Shen Chen, Xi Li
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate and verify the effectiveness of our method through extensive experiments on widely used evaluation datasets. ... We report both frame-level and video-level AUC scores in Table 1 and compare our methods with previous state-of-the-art methods... Ablation Study. ... Figure 4: The results of robustness evaluation on the test set of FF++ (c23). Video-level AUC (%) is reported under five different types of perturbations following (Jiang et al. 2020). |
| Researcher Affiliation | Collaboration | 1College of Computer Science and Technology, Zhejiang University 2Youtu Lab, Tencent EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and diagrams, such as Figure 2 depicting the overall pipeline. However, it does not include any explicit pseudocode blocks or algorithms with structured steps. |
| Open Source Code | No | The paper does not contain any statements or links indicating the release of source code for the described methodology. Phrases like 'We release our code...' or direct repository links are absent. |
| Open Datasets | Yes | For comprehensively assess the proposed method, we utilize five widely-used public datasets Face Forensics++ (FF++) (Rossler et al. 2019), Celeb-DF (CDF) (Li et al. 2020b), DFDC (Dolhansky et al. 2020), DFDCPreview (DFDCP) (Dolhansky et al. 2019), and DFD (Deepfakedetection 2021) in our experiments, following previous works (Mohseni et al. 2020; Yan et al. 2023c). |
| Dataset Splits | Yes | We use the training split of FF++ (Rossler et al. 2019) as the training set. For the cross-dataset evaluation, we test our model on datasets other than FF++. For the robustness evaluation, we test our model on the test split of FF++. |
| Hardware Specification | No | The paper mentions using 'ViT-B model as the backbone network of detectors' and initializes it with 'pre-trained weights from the vision encoder of CLIP', but it does not specify any hardware details like GPU models, CPU types, or memory specifications used for training or inference. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or frameworks used in the implementation, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | Yes | We utilize the ViT-B model as the backbone network of detectors. The backbone is initialized with the pre-trained weights from the vision encoder of CLIP (Radford et al. 2021) by default. We evenly sample 8 frames (Mohseni et al. 2020) from the training videos of FF++ (c23) to form the training set. For S-Branch, we divide the token map into 2 × 2 blocks in shuffling. For M-Branch, the mixing ratio r is set to 0.3. The rank for W is set to 4. The hyperparameters τ, λ1, and λ2 in loss functions are set to 0.1, 0.1, and 0.1 in the training. |