CCVS: Context-aware Controllable Video Synthesis
Authors: Guillaume Le Moing, Jean Ponce, Cordelia Schmid
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with an implementation of the proposed approach give very good qualitative and quantitative results on multiple tasks and standard benchmarks. We assess the merits of CCVS in the light of extensive experiments on various video synthesis tasks. |
| Researcher Affiliation | Academia | 1Inria and Department of Computer Science, ENS, CNRS, PSL Research University 2Center for Data Science, New York University |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code, pretrained models, and video samples synthesized by our approach are available at the url https://16lemoing.github.io/ccvs. |
| Open Datasets | Yes | Datasets. BAIR Robot Pushing [19] consists of 43k training and 256 test videos... We also test our method on Audio Set-Drums [25] for sound-conditioned synthesis on music performance, containing 6k and 1k video clips in train and test splits respectively. We evaluate on Kinetics-600 [5], a large and diverse action-recognition dataset with approximately 500K videos. |
| Dataset Splits | No | The paper mentions '43k training and 256 test videos' for BAIR and '6k and 1k video clips in train and test splits' for Audio Set-Drums, but does not explicitly state a validation split for any dataset. No explicit percentages or counts for a validation set were found. |
| Hardware Specification | Yes | All our models are trained on 4 Nvidia V100 GPUs (32GB VRAM each) |
| Software Dependencies | No | The paper mentions software like ADAM optimizer, Lite Flow Net, VGG network, and Inception3D network, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | Training details. All our models are trained on 4 Nvidia V100 GPUs (32GB VRAM each), using ADAM [41] optimizer, for multiple 20 hour runs. We adapt the batch size to fill the memory available. We use a learning rate of 0.02 to train the autoencoder, and exponential moving average [83] to obtain its final parameters. We use weighting factors (1, 10, 1, 1) and 0.25 for (λq, λr, λa, λc) and β in Equations (2) and (3) respectively. We use a learning rate of 10 5 to train the transformer. |