Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
VideoComposer: Compositional Video Synthesis with Motion Controllability
Authors: Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results suggest that Video Composer is able to control the spatial and temporal patterns simultaneously within a synthesized video in various forms, such as text description, sketch sequence, reference video, or even simply hand-crafted motions. |
| Researcher Affiliation | Industry | 1Alibaba Group 2Ant Group EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models are publicly available at https://videocomposer.github.io. |
| Open Datasets | Yes | To optimize Video Composer, we leverage two widely recognized and publicly accessible datasets: Web Vid10M [2] and LAION-400M [51]. |
| Dataset Splits | No | The paper mentions using Web Vid10M and LAION-400M for training and MSR-VTT for text-to-video generation evaluation, but it does not specify explicit training, validation, and test splits for these datasets with percentages or sample counts. |
| Hardware Specification | No | The paper mentions 'GPUs' being used for training, but does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU types, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions using FlashAttention [12] and extending Stable Diffusion 2, but does not provide specific version numbers for these or other software libraries or frameworks required for replication. |
| Experiment Setup | Yes | We adopt Adam W [35] as the default optimizer with a learning rate set to 5 10 5. In total, Video Composer is pre-trained for 400k steps, with the first and second stage being pre-trained for 132k steps and 268k steps, respectively. We use center crop and randomly sample video frames to compose the video input whose F = 16, H = 256 and W = 256. During the second stage pre-training, we adhere to [28], using a probability of 0.1 to keep all conditions, a probability of 0.1 to discard all conditions, and an independent probability of 0.5 to keep or discard a specific condition. |