reproducibilityindex.ai

CCVS: Context-aware Controllable Video Synthesis

Authors: Guillaume Le Moing, Jean Ponce, Cordelia Schmid

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with an implementation of the proposed approach give very good qualitative and quantitative results on multiple tasks and standard benchmarks. We assess the merits of CCVS in the light of extensive experiments on various video synthesis tasks.
Researcher Affiliation	Academia	1Inria and Department of Computer Science, ENS, CNRS, PSL Research University 2Center for Data Science, New York University
Pseudocode	No	The paper does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code, pretrained models, and video samples synthesized by our approach are available at the url https://16lemoing.github.io/ccvs.
Open Datasets	Yes	Datasets. BAIR Robot Pushing [19] consists of 43k training and 256 test videos... We also test our method on Audio Set-Drums [25] for sound-conditioned synthesis on music performance, containing 6k and 1k video clips in train and test splits respectively. We evaluate on Kinetics-600 [5], a large and diverse action-recognition dataset with approximately 500K videos.
Dataset Splits	No	The paper mentions '43k training and 256 test videos' for BAIR and '6k and 1k video clips in train and test splits' for Audio Set-Drums, but does not explicitly state a validation split for any dataset. No explicit percentages or counts for a validation set were found.
Hardware Specification	Yes	All our models are trained on 4 Nvidia V100 GPUs (32GB VRAM each)
Software Dependencies	No	The paper mentions software like ADAM optimizer, Lite Flow Net, VGG network, and Inception3D network, but does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	Training details. All our models are trained on 4 Nvidia V100 GPUs (32GB VRAM each), using ADAM [41] optimizer, for multiple 20 hour runs. We adapt the batch size to fill the memory available. We use a learning rate of 0.02 to train the autoencoder, and exponential moving average [83] to obtain its final parameters. We use weighting factors (1, 10, 1, 1) and 0.25 for (λq, λr, λa, λc) and β in Equations (2) and (3) respectively. We use a learning rate of 10 5 to train the transformer.