Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

X2-DFD: A framework for explainable and extendable Deepfake Detection

Authors: Yize Chen, Zhiyuan Yan, Guangliang Cheng, Kangran Zhao, Siwei Lyu, Baoyuan Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and ablations, followed by a comprehensive human study, validate the improved performance of our approach compared to the original MLLMs. More encouragingly, our framework is designed to be plug-and-play, allowing it to seamlessly integrate with future more advanced MLLMs and specific feature detectors, leading to continual improvement and extension to face the challenges of rapidly evolving deepfakes.
Researcher Affiliation	Academia	Yize Chen1,2 , Zhiyuan Yan3 ,Guangliang Cheng4, Kangran Zhao1,2, Siwei Lyu5, Baoyuan Wu1,2 1 The Chinese University of Hong Kong, Shenzhen 2 Shenzhen Loop Area Institute 3School of Electronic and Computer Engineering, Peking University, P.R. China 4 Department of Computer Science, University of Liverpool, Liverpool, L69 7ZX, UK 5Department of Computer Science and Engineering, University at Buffalo, State University of New York, Buffalo, NY, USA
Pseudocode	No	The paper describes the methodology in detailed prose and through figures (e.g., Figure 1, Figure 3), outlining sequential steps for each stage of the X 2-DFD framework. However, it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks, nor any structured, code-like formatted procedures within the main text or appendices.
Open Source Code	Yes	Code can be found on https://github.com/chenyize111/X2DFD.
Open Datasets	Yes	We evaluate our proposed method on a diverse set of widely-used deepfake detection datasets, including the Deepfake Detection Challenge (DFDC) [15], its preview version (DFDCP) [16], Deepfake Detection (DFD) [13], Celeb-DF-v2 (CDF-v2) [37], Face Forensics++ (FF++) [57] (c23 version for training), DFo [29], Wild Deepfake (WDF) [95], FFIW [92], and the newly released DF40 dataset [75], which incorporates state-of-the-art forgery techniques such as Facedancer [56], FSGAN [50], in Swap [58], e4s [33], Simswap [7], and Uniface [91]. In line with the standard deepfake benchmark [78], we use the c23 version of FF++ for training and other datasets for testing in the main table. Additionally, we evaluated a broader range of facial forgery types using the Di FF [9] dataset, a comprehensive collection of diffusion-generated facial images, which allowed us to test our method on a wider spectrum of forgery techniques.
Dataset Splits	Yes	In line with the standard deepfake benchmark [78], we use the c23 version of FF++ for training and other datasets for testing in the main table. Training and Testing Image Datasets We trained our model using the FF++ dataset [57]. For preprocessing and cropping, we adopted the methods from Deepfake Bench [78]. We utilized 8 frames per video for training and 32 frames per video for testing.
Hardware Specification	Yes	Training is performed on a single NVIDIA 4090 GPU for 3 epochs, with a learning rate of 2 10 5 in two layer mlp projector and 2 10 4 in others, a rank of 16, and an alpha value set conventionally to twice the rank at 32. We use a batch size of 4, a gradient accumulation step of 1, and a warmup ratio of 0.03 to stabilize training. for each epoch on NVIDIA 4090 (Driver Version: 535.247.01; CUDA Version: 12.2), AMD 32-Corecost for 4 hours by training on FF++ [57], each video we take 8 frames for training.
Software Dependencies	Yes	We initialize our model with the LLa VA-base weights and fine-tune the LLa VA model [44] using its official implementation codebase. for each epoch on NVIDIA 4090 (Driver Version: 535.247.01; CUDA Version: 12.2), AMD 32-Corecost for 4 hours by training on FF++ [57], each video we take 8 frames for training. The evaluation process was implemented in Python, leveraging the Open AI API to interact with GPT-4o. The GUI was developed using Python with the flask library and hosted on a server.
Experiment Setup	Yes	Training is performed on a single NVIDIA 4090 GPU for 3 epochs, with a learning rate of 2 10 5 in two layer mlp projector and 2 10 4 in others, a rank of 16, and an alpha value set conventionally to twice the rank at 32. We use a batch size of 4, a gradient accumulation step of 1, and a warmup ratio of 0.03 to stabilize training.