reproducibilityindex.ai

Video Diffusion Models

Authors: Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report our results on video diffusion models for unconditional video generation (Section 4.1), conditional video generation (Section 4.2), and text-conditioned video generation (Section 4.3). We evaluate our models using standard metrics such as FVD [54], FID [19], and IS [43]; details on evaluation are provided below alongside each benchmark.
Researcher Affiliation	Industry	Jonathan Ho jonathanho@google.com Tim Salimans salimans@google.com Alexey Gritsenko agritsenko@google.com William Chan williamchan@google.com Mohammad Norouzi mnorouzi@google.com David J. Fleet davidfleet@google.com
Pseudocode	No	The paper includes a diagram of the 3D U-Net architecture (Figure 1) but no pseudocode or algorithm blocks.
Open Source Code	No	As with prior work in generative modeling, however, our methods have the potential for causing harmful impact and could enhance malicious or unethical uses of generative models, such as fake content generation, harassment, and misinformation spread, and thus we have decided not to release our models.
Open Datasets	Yes	We use the data loader provided by Tensor Flow Datasets [1] without further processing, and we train on all 13,320 videos. ... We evaluate video prediction performance on BAIR Robot Pushing [17]... We additionally evaluate video prediction performance on the Kinetics-600 benchmark [27, 9].
Dataset Splits	Yes	For FID and FVD, we report two numbers which are measured against the training and validation sets, respectively. ... We train unconditional models on this dataset at the 64 64 resolution and evaluate on 50 thousand randomly sampled videos from the test set. ... Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]
Hardware Specification	No	Appendix A. Architecture hyperparameters, training details, and compute resources are listed in Appendix A. The main paper does not explicitly detail specific hardware.
Software Dependencies	No	We use the data loader provided by Tensor Flow Datasets [1] without further processing... We use the C3D network [51]2 for calculating FID and IS... we condition the diffusion model on captions in the form of BERT-large embeddings [15]. No version numbers are given for these software components.
Experiment Setup	Yes	Table 5 reports results that verify the effectiveness of classiﬁer-free guidance [20] on text-to-video generation. As expected, there is clear improvement in the Inception Score-like metrics with higher guidance weight, while the FID-like metrics improve and then degrade with increasing guidance weight. Similar ﬁndings have been reported on text-to-image generation [36]. Frameskip Guidance weight ... 1 1.0 ... 2.0 ... 5.0