reproducibilityindex.ai

MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Authors: Vikram Voleti, Alexia Jolicoeur-Martineau, Chris Pal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our approach yields SOTA results across standard video prediction and interpolation benchmarks
Researcher Affiliation	Collaboration	Vikram Voleti Mila, University of Montreal Canada vikram.voleti@umontreal.ca Alexia Jolicoeur-Martineau* Mila, University of Montreal Canada alexia.jolicoeur-martineau@mail.mcgill.ca Christopher Pal Mila, Polytechnique Montreal Canada CIFAR AI Chair Service Now Research
Pseudocode	No	The paper includes network architecture diagrams (Figure 3) and mathematical formulations but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://mask-cond-video-diffusion.github.io/
Open Datasets	Yes	We show the results of our video prediction experiments on test data that was never seen during training in Tables 1 4 for Stochastic Moving MNIST (SMMNIST) 2, KTH 3, BAIR 4, and Cityscapes 5respectively. We present unconditional generation results for BAIR in Table 5 and UCF-101 6 in Table 6, and interpolation results for SMMNIST, KTH, and BAIR in Table 7.
Dataset Splits	Yes	For UCF101, each video clip is center-cropped at 240 240 and resized to 64 64, taking care to maintain the train-test splits.
Hardware Specification	No	Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using 4 GPUs. ...we were limited to 4 GPUs for our work here.
Software Dependencies	No	The paper does not explicitly list software dependencies with specific version numbers.
Experiment Setup	Yes	Unless otherwise specified, we set the mask probability to 0.5 when masking was used. For sampling, we report results using the sampling methods DDPM [Ho et al., 2020] or DDIM [Song et al., 2020] with only 100 sampling steps, though our models were trained with 1000, to make sampling faster. ...all our models are trained to predict only 4-5 current frames at a time