VideoTetris: Towards Compositional Text-to-Video Generation
Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di ZHANG, Bin CUI
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our Video Tetris achieves impressive qualitative and quantitative results in compositional T2V generation. |
| Researcher Affiliation | Collaboration | 1Peking University 2Kuaishou Technology |
| Pseudocode | No | The paper includes equations and structured prompt templates, but not a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code: https://github.com/YangLing0818/VideoTetris |
| Open Datasets | Yes | For the second scenario, we employed the core Control Net [22]-like branch from Streaming T2V [11] as the backbone and processed the Panda-70M [15] dataset using the Enhanced Video Data Preprocessing methods in section 3.2 as the training set. |
| Dataset Splits | No | The paper states Panda-70M as the training set and describes a method for generating test prompts, but does not explicitly provide details about training/validation/test splits of Panda-70M itself, nor an explicit validation set. |
| Hardware Specification | Yes | We trained our model with batch size = 2 and learning rate = 1e-5 on 4 A800 GPUs for 16k steps in total. |
| Software Dependencies | No | The paper mentions software like Control Net, Streaming T2V, Chat GPT3, GPT-4, and LLama-34, but does not provide specific version numbers for software dependencies needed for reproducibility, such as Python or PyTorch versions. |
| Experiment Setup | Yes | In training process, we randomly drop out 5% of text prompts for classifier-free guidance training. We trained our model with batch size = 2 and learning rate = 1e-5 on 4 A800 GPUs for 16k steps in total. ... The hyperparameters in section 3.2 and section 3.3 are shown in table 8. |