reproducibilityindex.ai

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Authors: Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos.
Researcher Affiliation	Academia	1 Shanghai Artificial Intelligence Laboratory, 2 East China Normal University 3 Shanghai Jiao Tong University, 4 Dept of Data Science & AI, Monash University 5 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 6 S-Lab, Nanyang Technological University
Pseudocode	No	The paper does not contain structured pseudocode or explicitly labeled algorithm blocks.
Open Source Code	Yes	Project page: https://vchitect.github.io/SEINE-project/.
Open Datasets	Yes	We first utilize the Web Vid10M dataset (Bain et al., 2021) as the main training set... We employ the MSR-VTT dataset... on the UCF101 dataset (Soomro et al., 2012).
Dataset Splits	No	The paper mentions using Web Vid10M, MSR-VTT, and UCF101 datasets, and refers to a ‘test set’ for MSR-VTT and ‘training set’ for UCF-101, but does not provide specific split percentages or sample counts for training, validation, and test datasets to enable reproduction of the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using La Vie-base and Stable Diffusion models but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries required for reproduction.
Experiment Setup	Yes	Our model is trained on videos of 320 512 resolution with 16 frames. In our model, we set p = 0.15... Our results are generated by the DDIM sampling of 100 steps.