Boosting Text-to-Video Generative Model with MLLMs Feedback
Authors: Xun Wu, Shaohan Huang, Guolong Wang, Jing Xiong, Furu Wei
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experiments confirm the effectiveness of both VIDEOPREFER and VIDEORM, representing a significant step forward in the field. |
| Researcher Affiliation | Collaboration | Xun Wu1, Shaohan Huang1B, Guolong Wang2, Jing Xiong3, Furu Wei1 1 Microsoft Research Asia, 2 University of International Business and Economics 3 The University of Hong Kong |
| Pseudocode | Yes | Algorithm 1 DRa FT-V: Reward Reinforcement Learning for Fine-tuning Text-to-Video Models with VIDEORM |
| Open Source Code | No | We will public our data and code upon paper acceptance, due to the management regulations of our institution. |
| Open Datasets | Yes | VIDEOPREFER, which includes 135,000 preference annotations. Utilizing this dataset, we introduce VIDEORM, the first general-purpose reward model tailored for video preference in the text-to-video domain. Our comprehensive experiments confirm the effectiveness of both VIDEOPREFER and VIDEORM, representing a significant step forward in the field. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test split percentages or sample counts for its own generated dataset (VIDEOPREFER) or for how it samples from the mixture of existing datasets for training. |
| Hardware Specification | Yes | All VIDEORM series models are trained in half-precision on 8 32GB NVIDIA V100 GPUs |
| Software Dependencies | No | The paper mentions software like PyTorch and CLIP but does not provide specific version numbers for these or any other key software components. |
| Experiment Setup | Yes | All VIDEORM series models are trained in half-precision on 8 32GB NVIDIA V100 GPUs, with a learning rate of 1e-5 and batch size of 64 in total. We set the input frames N = 8. |