Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments |
| Researcher Affiliation | Collaboration | 1 VCIP & TMCC, CS, Nankai University 2 Byte Dance Inc. 3 NKIARI, Futian, Shenzhen |
| Pseudocode | Yes | To make it clearer, we also show the pseudo code in Algorithm ?? in the Appendix. |
| Open Source Code | No | We intend to make our code publicly available following the paper s acceptance. |
| Open Datasets | Yes | Following the previous methods [12, 7], we use the Webvid10M [2] dataset to train our transition video model. Webvid-10M: Webvid-10M [2] is a large-scale video dataset featuring 10 million video clips with associated textual descriptions, designed for training and evaluating machine learning models on video understanding and generation tasks. URL: www.robots.ox.ac.uk/~vgg/research/frozen-in-time/ |
| Dataset Splits | No | The paper states 'We randomly sample around 1000 videos as the test dataset' but does not specify explicit training/validation/test splits or mention a validation set. |
| Hardware Specification | Yes | conduct training 100k iterations for our Semantic Motion Predictor on 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions software like Stable Diffusion XL, Stable Diffusion 1.5, and Open CLIP ViT-H-14, but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | All comparison models utilize 50-step DDIM sampling[43], and the classifier-free guidance score [18] is consistently set to 5. We then set our learning rate at 1e-4 and conduct training 100k iterations for our Semantic Motion Predictor on 8 A100 GPUs. |