Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Video Generation with Human Feedback
Authors: Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di ZHANG, Kun Gai, Yujiu Yang, Wanli Ouyang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that Video Reward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. |
| Researcher Affiliation | Collaboration | 1MMLab, CUHK 2Tsinghua University 3Kling Team, Kuaishou Technology 4Shanghai Jiao Tong University 5Shanghai AI Laboratory |
| Pseudocode | Yes | We provide Flow-DPO pseudo-code in Appendix C. Pseudo-code is provided in Appendix C. |
| Open Source Code | Yes | We include the training code for our Video Reward model in the Supplementary. The preference dataset and alignment code will be released in the future. |
| Open Datasets | Yes | (1) Video Gen-Reward Bench: Built upon the third-party prompt-video dataset Video Gen-Eval [84]... (2) Gen AI-Bench [29]: Gen AI-Bench features short (2-seconds) videos generated by pre-Sora-era T2V models |
| Dataset Splits | Yes | We reserve 13 000 triplets whose prompts never appear in training as a validation set. |
| Hardware Specification | Yes | GPUs 16 NVIDIA A800@80G ... GPUs 8 NVIDIA A800@80G |
| Software Dependencies | No | The paper mentions software components like "Qwen2-VL-2B [72]" for the backbone and "Adam [30]" as an optimizer, and uses "Lo RA [24]" for fine-tuning, but does not provide specific version numbers for general software libraries or frameworks like PyTorch, Python, or CUDA. |
| Experiment Setup | Yes | Training strategy Lo RA [24] Lo RA alpha 128 Lo RA dropout 0.0 Lo RA R 64 Lo RA target-modules q proj,k proj,v proj,o proj Optimizer Adam [30] Learning rate 5e-6 Epochs 1 Batch size 64 GPUs 16 NVIDIA A800@80G ... We sample videos at 2 fps, with a resolution of approximately 448 448 pixels while preserving the original asoect ratio. |