reproducibilityindex.ai

Boosting Text-to-Video Generative Model with MLLMs Feedback

Authors: Xun Wu, Shaohan Huang, Guolong Wang, Jing Xiong, Furu Wei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive experiments confirm the effectiveness of both VIDEOPREFER and VIDEORM, representing a significant step forward in the field.
Researcher Affiliation	Collaboration	Xun Wu1, Shaohan Huang1B, Guolong Wang2, Jing Xiong3, Furu Wei1 1 Microsoft Research Asia, 2 University of International Business and Economics 3 The University of Hong Kong
Pseudocode	Yes	Algorithm 1 DRa FT-V: Reward Reinforcement Learning for Fine-tuning Text-to-Video Models with VIDEORM
Open Source Code	No	We will public our data and code upon paper acceptance, due to the management regulations of our institution.
Open Datasets	Yes	VIDEOPREFER, which includes 135,000 preference annotations. Utilizing this dataset, we introduce VIDEORM, the first general-purpose reward model tailored for video preference in the text-to-video domain. Our comprehensive experiments confirm the effectiveness of both VIDEOPREFER and VIDEORM, representing a significant step forward in the field.
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test split percentages or sample counts for its own generated dataset (VIDEOPREFER) or for how it samples from the mixture of existing datasets for training.
Hardware Specification	Yes	All VIDEORM series models are trained in half-precision on 8 32GB NVIDIA V100 GPUs
Software Dependencies	No	The paper mentions software like PyTorch and CLIP but does not provide specific version numbers for these or any other key software components.
Experiment Setup	Yes	All VIDEORM series models are trained in half-precision on 8 32GB NVIDIA V100 GPUs, with a learning rate of 1e-5 and batch size of 64 in total. We set the input frames N = 8.