reproducibilityindex.ai

On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection

Authors: Xiufeng Song, Xiao Guo, Jiache Zhang, Qirui Li, LEI BAI, Xiaoming Liu, Guangtao Zhai, Xiaohong Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	MM-Det achieves state-of-the-art performance in DVF, demonstrating the effectiveness of our algorithm. Both source code and DVF are available at link.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University 2Michigan State University 3Shanghai Artificial Intelligence Laboratory {akikaze, zjc he, iapple1, zhaiguangtao, xiaohongliu}@sjtu.edu.cn {guoxia11, liuxm}@cse.msu.edu baisanshi@gmail.com Corresponding Author
Pseudocode	No	The paper describes the system architecture and components (LMM Branch, ST Branch, Dynamic Fusion) in detail through text and diagrams (e.g., Fig. 4), but it does not present formal pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Both source code and DVF are available at link.
Open Datasets	Yes	We construct a large-scale dataset for the video forensic task named Diffusion Video Forensics (DVF), as shown in Fig. 6. DVF contains 8 diffusion generative methods, including Stable Diffusion [42], Video Crafter1 [5], Zeroscope, Sora, Pika, Open Sora, Stable Video, and Stable Video Diffusion[4]. ... Both source code and DVF are available at link.
Dataset Splits	Yes	In training, 1, 000 videos from You Tube and 1, 973 fake videos generated by Stable Video Diffusion serve as the training set, in which 90% are used for training and the remaining 10% for validation.
Hardware Specification	Yes	As for the experimental resources in training and inference, we conduct all experiments using a single NVIDIA RTX 3090 GPU and a maximum of 200G memory.
Software Dependencies	No	The paper mentions key software components like 'LLa VA [28] v1.5', 'CLIP [39] encoder E of CLIP-Vi T-L-patch14-336', 'large language model D of Vicuna-7b', and 'Lo RA [21]', and 'Adam optimizer'. While LLaVA v1.5 and Vicuna-7b have versions, a comprehensive list of all critical software dependencies (e.g., Python, PyTorch, CUDA versions) with specific version numbers is not provided for full reproducibility.
Experiment Setup	Yes	We use an Adam optimizer with the learning rate set as 2e 5 for 10 epochs. ... The training set is split into 9 : 1 for training and validation data. For each video, successive 10 frames are randomly sampled and cropped into 224 224 as the input. ... We use an Adam optimizer with the learning rate set as 1e 4 for training until the model converges.