Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Authors: Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, Mingkui Tan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD.
Researcher Affiliation	Academia	1South China University of Technology, 2University of Science and Technology of China 3Key Laboratory of Big Data and Intelligent Robot, Ministry of Education, 4Pazhou Lab 5University of Melbourne, 6Hunan University, 7Hong Kong Baptist University
Pseudocode	Yes	Algorithm 1 Training deep kernel of MMD Algorithm 2 Detecting videos with NSG-VD
Open Source Code	Yes	The source code is available at https://github.com/ZSHsh98/NSG-VD.
Open Datasets	Yes	We evaluate our methods on the Gen Video benchmark [25], a large-scale dataset for AI-generated video detection that includes diverse real-world videos and synthetic content from multiple generative models. We use Kinetics-400 [62] as the real video source, SEINE [63] or Pika [64] as the AI-generated videos for training. The test set comprises MSR-VTT [65] and 10 diverse AI-generated datasets from different generation paradigms.
Dataset Splits	Yes	We use 10, 000 real videos from Kinetics-400 and 10, 000 generated videos from Pika (Table 1) and SENIE (Table 2) for training, respectively. To thoroughly assess reliability under these conditions, we train all models using 10, 000 Kinetics-400 real videos and only 1, 000 SENIE-generated videos. Throughout all experiments, we filter videos with less than 8 frames and only uniformly sample 8 frames for each video during training and testing.
Hardware Specification	Yes	We conduct our experiments on a server with 1 NVIDIA RTX 3090 GPU using Python 3.10.17 and Pytorch 2.7.0.
Software Dependencies	Yes	We conduct our experiments on a server with 1 NVIDIA RTX 3090 GPU using Python 3.10.17 and Pytorch 2.7.0.
Experiment Setup	Yes	We use Adam optimizer [90] to optimize the kernel parameters ω with batchsize 24, learning rate 0.0001, weight decay 0.1, σϕ = 0.1 and σΦ = 100. For the testing, we set the decision threshold τ = 1 in Eqn. (11). For a given video x at t-th frame, we compute its score feature x log p(x, t) by diffusing xt at diffusion timestep 5/1, 000 and passing it through sθ. For the deep kernel ϕG, we employ a single-layer of Swin transformer [89], mapping input features of dimension 8 224 224 to a 300-dimensional output.