SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Authors: Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos.
Researcher Affiliation Academia 1 Shanghai Artificial Intelligence Laboratory, 2 East China Normal University 3 Shanghai Jiao Tong University, 4 Dept of Data Science & AI, Monash University 5 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 6 S-Lab, Nanyang Technological University
Pseudocode No The paper does not contain structured pseudocode or explicitly labeled algorithm blocks.
Open Source Code Yes Project page: https://vchitect.github.io/SEINE-project/.
Open Datasets Yes We first utilize the Web Vid10M dataset (Bain et al., 2021) as the main training set... We employ the MSR-VTT dataset... on the UCF101 dataset (Soomro et al., 2012).
Dataset Splits No The paper mentions using Web Vid10M, MSR-VTT, and UCF101 datasets, and refers to a ‘test set’ for MSR-VTT and ‘training set’ for UCF-101, but does not provide specific split percentages or sample counts for training, validation, and test datasets to enable reproduction of the data partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using La Vie-base and Stable Diffusion models but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries required for reproduction.
Experiment Setup Yes Our model is trained on videos of 320 512 resolution with 16 frames. In our model, we set p = 0.15... Our results are generated by the DDIM sampling of 100 steps.