Video Diffusion Models with Local-Global Context Guidance
Authors: Siyuan Yang, Lu Zhang, Yu Liu, Zhizhuo Jiang, You He
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the proposed method achieves favorable performance on video prediction, interpolation, and unconditional video generation. Our experiments demonstrate that the proposed method achieves state-of-the-art performance on video prediction, as well as favorable performance on interpolation and unconditional video generation. |
| Researcher Affiliation | Academia | Siyuan Yang 1 , Lu Zhang2 , Yu Liu1 , Zhizhuo Jiang 1 and You He 1 1Tsinghua University 2Dalian University of Technology yang-sy21@mails.tsinghua.edu.cn, zhangluu@dlut.edu.cn, {liuyu77360132, heyou f}@126.com, jiangzhizhuo@sz.tsinghua.edu.cn |
| Pseudocode | No | The paper does not contain a clearly labeled "Pseudocode" or "Algorithm" block. It describes the method in text and with equations. |
| Open Source Code | Yes | We release code at https://github.com/exisas/LGC-VD. |
| Open Datasets | Yes | Cityscapes [Cordts et al., 2016] is a large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities. BAIR Robot Pushing [Ebert et al., 2017] is a common benchmark in the video literature, which consists of roughly 44000 movies of robot pushing motions at 64x64 spatial resolution. |
| Dataset Splits | Yes | This package includes a training set of 2975 videos, a validation set of 500 videos, and a test set of 1525 videos, each with 30 frames. |
| Hardware Specification | Yes | All of our models are trained with Adam on 4 NVIDIA Tesla V100s with a learning rate of 1e-4 and a batch size of 32 for Cityscapes and 192 for BAIR. |
| Software Dependencies | No | The paper mentions using "vprediction [Salimans and Ho, 2022]" to overcome a problem, but it does not provide specific version numbers for this or any other software dependencies like libraries or frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | All of our models are trained with Adam on 4 NVIDIA Tesla V100s with a learning rate of 1e-4 and a batch size of 32 for Cityscapes and 192 for BAIR. We use the cosine noise schedule in the training phase and set the diffusion step T to 1000. For both datasets, we set the total video length L to 14, the video length N for each stage to 8, and the number of conditional frames K to 2. At testing, we sample 100 steps using DDPM. |