reproducibilityindex.ai

Few-shot Video-to-Video Synthesis

Authors: Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Bryan Catanzaro, Jan Kautz

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experimental validations with comparisons to strong baselines using several large-scale video datasets including human-dancing videos, talking-head videos, and street-scene videos. The experimental results verify the effectiveness of the proposed framework in addressing the two limitations of existing vid2vid approaches.
Researcher Affiliation	Industry	Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro NVIDIA Corporation {tingchunw,mingyul,atao,guilinl,jkautz,bcatanzaro}@nvidia.com
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is available at our website.
Open Datasets	Yes	Street-scene videos. We use street-scene videos from three different geographical areas: 1) Germany, from the Cityscapes dataset [9], 2) Boston, collected using a dashcam, and 3) NYC, collected by a different dashcam. ... At test time, in addition to the test set images from these three areas, we also test on the Apollo Scape [20] and Cam Vid [5] datasets, which are not included in the training set. Face videos. We use the real videos in the Face Forensics dataset [44], which contains 854 videos of news brieﬁng from different reporters.
Dataset Splits	Yes	We split the dataset into 704 videos for training and 150 videos for validation. We divide them into a training set and a test set with no overlapping subjects.
Hardware Specification	Yes	Training was conducted using an NVIDIA DGX-1 machine with 8 32GB V100 GPUs.
Software Dependencies	No	The paper mentions using the ADAM optimizer and Open Pose, but does not specify version numbers for any key software components or libraries.
Experiment Setup	Yes	Our training procedure follows the procedure from the vid2vid work [57]. We use the ADAM optimizer [26] with lr = 0.0004 and (β1, β2) = (0.5, 0.999). ... We adopt a progressive training technique, which gradually increases the length of training sequences. Initially, we set T = 1, which means the network only generates single frames. After that, we double the sequence length (T) for every few epochs.