Few-shot Video-to-Video Synthesis

Authors: Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Bryan Catanzaro, Jan Kautz

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experimental validations with comparisons to strong baselines using several large-scale video datasets including human-dancing videos, talking-head videos, and street-scene videos. The experimental results verify the effectiveness of the proposed framework in addressing the two limitations of existing vid2vid approaches.
Researcher Affiliation Industry Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro NVIDIA Corporation {tingchunw,mingyul,atao,guilinl,jkautz,bcatanzaro}@nvidia.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at our website.
Open Datasets Yes Street-scene videos. We use street-scene videos from three different geographical areas: 1) Germany, from the Cityscapes dataset [9], 2) Boston, collected using a dashcam, and 3) NYC, collected by a different dashcam. ... At test time, in addition to the test set images from these three areas, we also test on the Apollo Scape [20] and Cam Vid [5] datasets, which are not included in the training set. Face videos. We use the real videos in the Face Forensics dataset [44], which contains 854 videos of news briefing from different reporters.
Dataset Splits Yes We split the dataset into 704 videos for training and 150 videos for validation. We divide them into a training set and a test set with no overlapping subjects.
Hardware Specification Yes Training was conducted using an NVIDIA DGX-1 machine with 8 32GB V100 GPUs.
Software Dependencies No The paper mentions using the ADAM optimizer and Open Pose, but does not specify version numbers for any key software components or libraries.
Experiment Setup Yes Our training procedure follows the procedure from the vid2vid work [57]. We use the ADAM optimizer [26] with lr = 0.0004 and (β1, β2) = (0.5, 0.999). ... We adopt a progressive training technique, which gradually increases the length of training sequences. Initially, we set T = 1, which means the network only generates single frames. After that, we double the sequence length (T) for every few epochs.