reproducibilityindex.ai

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments
Researcher Affiliation	Collaboration	1 VCIP & TMCC, CS, Nankai University 2 Byte Dance Inc. 3 NKIARI, Futian, Shenzhen
Pseudocode	Yes	To make it clearer, we also show the pseudo code in Algorithm ?? in the Appendix.
Open Source Code	No	We intend to make our code publicly available following the paper s acceptance.
Open Datasets	Yes	Following the previous methods [12, 7], we use the Webvid10M [2] dataset to train our transition video model. Webvid-10M: Webvid-10M [2] is a large-scale video dataset featuring 10 million video clips with associated textual descriptions, designed for training and evaluating machine learning models on video understanding and generation tasks. URL: www.robots.ox.ac.uk/~vgg/research/frozen-in-time/
Dataset Splits	No	The paper states 'We randomly sample around 1000 videos as the test dataset' but does not specify explicit training/validation/test splits or mention a validation set.
Hardware Specification	Yes	conduct training 100k iterations for our Semantic Motion Predictor on 8 A100 GPUs.
Software Dependencies	No	The paper mentions software like Stable Diffusion XL, Stable Diffusion 1.5, and Open CLIP ViT-H-14, but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	All comparison models utilize 50-step DDIM sampling[43], and the classifier-free guidance score [18] is consistently set to 5. We then set our learning rate at 1e-4 and conduct training 100k iterations for our Semantic Motion Predictor on 8 A100 GPUs.