Flexible Diffusion Modeling of Long Videos

Authors: William Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach, Frank Wood

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform our main comparisons on the video completion task. In keeping with Saxena et al. [26], we condition on the first 36 frames of each video and sample the remainder. We present results on three datasets: GQN-Mazes [10], in which videos are 300 frames long; Mine RL Navigate [13, 26] (which we will from now on refer to as simply Mine RL), in which videos are 500 frames long; and the CARLA Town01 dataset we release, for which videos are 1000 frames long.
Researcher Affiliation Collaboration William Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach, Frank Wood Department of Computer Science University of British Columbia Vancouver, Canada {wsgh,saeidnp,vadmas,weilbach,fwood}@cs.ubc.ca Frank Wood is also affiliated with the Montréal Institute for Learning Algorithms (Mila) and Inverted AI.
Pseudocode Yes Algorithm 1 Sample a video v given a sampling scheme [(Xs, Ys)]S s=1. and Algorithm 2 Sampling training tasks X, Y u( ) given N, K.
Open Source Code Yes We include our released source code but we leave its exposition to the supplementary material. and https://github.com/plai-group/flexible-video-diffusion-modeling
Open Datasets Yes We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator. and We release the CARLA Town 01 dataset along with code and our trained regression model to allow future comparisons. and We present results on three datasets: GQN-Mazes [10], in which videos are 300 frames long; Mine RL Navigate [13, 26] (which we will from now on refer to as simply Mine RL), in which videos are 500 frames long; and the CARLA Town01 dataset we release, for which videos are 1000 frames long.
Dataset Splits No The paper mentions '408 training and 100 test videos' for the CARLA Town01 dataset, but it does not explicitly state the use of a separate validation set or specify validation splits for any of the datasets.
Hardware Specification No The paper states that computational resources were provided by West Grid, Compute Canada, and Advanced Research Computing at the University of British Columbia, and mentions that details are in the appendix, but it does not provide specific hardware models (e.g., GPU/CPU types) in the main text.
Software Dependencies No The paper mentions that the work is implemented on PyTorch, and provides a link to its source code, but it does not explicitly list specific version numbers for ancillary software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train FDM in all cases with the maximum number of represented frames K = 20.