reproducibilityindex.ai

VideoTetris: Towards Compositional Text-to-Video Generation

Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di ZHANG, Bin CUI

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our Video Tetris achieves impressive qualitative and quantitative results in compositional T2V generation.
Researcher Affiliation	Collaboration	1Peking University 2Kuaishou Technology
Pseudocode	No	The paper includes equations and structured prompt templates, but not a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code: https://github.com/YangLing0818/VideoTetris
Open Datasets	Yes	For the second scenario, we employed the core Control Net [22]-like branch from Streaming T2V [11] as the backbone and processed the Panda-70M [15] dataset using the Enhanced Video Data Preprocessing methods in section 3.2 as the training set.
Dataset Splits	No	The paper states Panda-70M as the training set and describes a method for generating test prompts, but does not explicitly provide details about training/validation/test splits of Panda-70M itself, nor an explicit validation set.
Hardware Specification	Yes	We trained our model with batch size = 2 and learning rate = 1e-5 on 4 A800 GPUs for 16k steps in total.
Software Dependencies	No	The paper mentions software like Control Net, Streaming T2V, Chat GPT3, GPT-4, and LLama-34, but does not provide specific version numbers for software dependencies needed for reproducibility, such as Python or PyTorch versions.
Experiment Setup	Yes	In training process, we randomly drop out 5% of text prompts for classifier-free guidance training. We trained our model with batch size = 2 and learning rate = 1e-5 on 4 A800 GPUs for 16k steps in total. ... The hyperparameters in section 3.2 and section 3.3 are shown in table 8.