Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Video Generation with Human Feedback

Authors: Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di ZHANG, Kun Gai, Yujiu Yang, Wanli Ouyang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results indicate that Video Reward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs.
Researcher Affiliation	Collaboration	1MMLab, CUHK 2Tsinghua University 3Kling Team, Kuaishou Technology 4Shanghai Jiao Tong University 5Shanghai AI Laboratory
Pseudocode	Yes	We provide Flow-DPO pseudo-code in Appendix C. Pseudo-code is provided in Appendix C.
Open Source Code	Yes	We include the training code for our Video Reward model in the Supplementary. The preference dataset and alignment code will be released in the future.
Open Datasets	Yes	(1) Video Gen-Reward Bench: Built upon the third-party prompt-video dataset Video Gen-Eval [84]... (2) Gen AI-Bench [29]: Gen AI-Bench features short (2-seconds) videos generated by pre-Sora-era T2V models
Dataset Splits	Yes	We reserve 13 000 triplets whose prompts never appear in training as a validation set.
Hardware Specification	Yes	GPUs 16 NVIDIA A800@80G ... GPUs 8 NVIDIA A800@80G
Software Dependencies	No	The paper mentions software components like "Qwen2-VL-2B [72]" for the backbone and "Adam [30]" as an optimizer, and uses "Lo RA [24]" for fine-tuning, but does not provide specific version numbers for general software libraries or frameworks like PyTorch, Python, or CUDA.
Experiment Setup	Yes	Training strategy Lo RA [24] Lo RA alpha 128 Lo RA dropout 0.0 Lo RA R 64 Lo RA target-modules q proj,k proj,v proj,o proj Optimizer Adam [30] Learning rate 5e-6 Epochs 1 Batch size 64 GPUs 16 NVIDIA A800@80G ... We sample videos at 2 fps, with a resolution of approximately 448 448 pixels while preserving the original asoect ratio.