Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Authors: Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our approach achieves real-time streaming video generation with sub-second latency on a single GPU, while matching or even surpassing the generation quality of significantly slower and non-causal diffusion models.
Researcher Affiliation	Collaboration	Xun Huang1 Zhengqi Li1 Guande He2 Mingyuan Zhou2 Eli Shechtman1 1Adobe Research 2The University of Texas at Austin
Pseudocode	Yes	Algorithm 1 Self Forcing Training Algorithm 2 Autoregressive Diffusion Inference with Rolling KV Cache
Open Source Code	Yes	We include the code and data in the supplemental material.
Open Datasets	Yes	We use the Vid Pro S subset from Vid Pro M [85], which contains around 1M semantically unique user-written text-to-video prompts.
Dataset Splits	No	The paper mentions using a filtered and LLM-extended version of Vid Pro M for sampling text prompts and generating 70k videos as a dataset for training GANs, but it does not specify explicit training, validation, or test splits for these datasets using percentages, sample counts, or predefined splits.
Hardware Specification	Yes	Extensive experiments demonstrate that our model enables real-time video generation at 17 FPS with sub-second latency on a single H100 GPU Most of our training runs use 64 NVIDIA GPUs (80GB memory each) with a per-GPU batch size of 1.
Software Dependencies	No	The paper mentions several components and models like Wan2.1 [83], Flex Attention [15], Flash Attention-3 [72], and Qwen/Qwen2.5-7B-Instruct [95], but it does not provide specific version numbers for general ancillary software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	In Appendix A, under 'Noise schedule and model parameterization' and 'Training details', the paper specifies the few-step diffusion schedule as [1000, 750, 500, 250], batch size (per-GPU batch size of 1 with gradient accumulation), and training duration. Furthermore, Table 3 provides detailed 'Specification of training hyperparameters' including Optimizer (Adam W), Learning rate (2e-6, 4e-7), EMA decay (0.99), and Generator/critic update ratio (5, 1) for DMD, Si D, and GAN objectives.