Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Video Generation with Human Feedback

Authors: Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di ZHANG, Kun Gai, Yujiu Yang, Wanli Ouyang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results indicate that Video Reward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs.
Researcher Affiliation Collaboration 1MMLab, CUHK 2Tsinghua University 3Kling Team, Kuaishou Technology 4Shanghai Jiao Tong University 5Shanghai AI Laboratory
Pseudocode Yes We provide Flow-DPO pseudo-code in Appendix C. Pseudo-code is provided in Appendix C.
Open Source Code Yes We include the training code for our Video Reward model in the Supplementary. The preference dataset and alignment code will be released in the future.
Open Datasets Yes (1) Video Gen-Reward Bench: Built upon the third-party prompt-video dataset Video Gen-Eval [84]... (2) Gen AI-Bench [29]: Gen AI-Bench features short (2-seconds) videos generated by pre-Sora-era T2V models
Dataset Splits Yes We reserve 13 000 triplets whose prompts never appear in training as a validation set.
Hardware Specification Yes GPUs 16 NVIDIA A800@80G ... GPUs 8 NVIDIA A800@80G
Software Dependencies No The paper mentions software components like "Qwen2-VL-2B [72]" for the backbone and "Adam [30]" as an optimizer, and uses "Lo RA [24]" for fine-tuning, but does not provide specific version numbers for general software libraries or frameworks like PyTorch, Python, or CUDA.
Experiment Setup Yes Training strategy Lo RA [24] Lo RA alpha 128 Lo RA dropout 0.0 Lo RA R 64 Lo RA target-modules q proj,k proj,v proj,o proj Optimizer Adam [30] Learning rate 5e-6 Epochs 1 Batch size 64 GPUs 16 NVIDIA A800@80G ... We sample videos at 2 fps, with a resolution of approximately 448 448 pixels while preserving the original asoect ratio.