Few-shot Video-to-Video Synthesis
Authors: Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Bryan Catanzaro, Jan Kautz
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experimental validations with comparisons to strong baselines using several large-scale video datasets including human-dancing videos, talking-head videos, and street-scene videos. The experimental results verify the effectiveness of the proposed framework in addressing the two limitations of existing vid2vid approaches. |
| Researcher Affiliation | Industry | Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro NVIDIA Corporation {tingchunw,mingyul,atao,guilinl,jkautz,bcatanzaro}@nvidia.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code is available at our website. |
| Open Datasets | Yes | Street-scene videos. We use street-scene videos from three different geographical areas: 1) Germany, from the Cityscapes dataset [9], 2) Boston, collected using a dashcam, and 3) NYC, collected by a different dashcam. ... At test time, in addition to the test set images from these three areas, we also test on the Apollo Scape [20] and Cam Vid [5] datasets, which are not included in the training set. Face videos. We use the real videos in the Face Forensics dataset [44], which contains 854 videos of news briefing from different reporters. |
| Dataset Splits | Yes | We split the dataset into 704 videos for training and 150 videos for validation. We divide them into a training set and a test set with no overlapping subjects. |
| Hardware Specification | Yes | Training was conducted using an NVIDIA DGX-1 machine with 8 32GB V100 GPUs. |
| Software Dependencies | No | The paper mentions using the ADAM optimizer and Open Pose, but does not specify version numbers for any key software components or libraries. |
| Experiment Setup | Yes | Our training procedure follows the procedure from the vid2vid work [57]. We use the ADAM optimizer [26] with lr = 0.0004 and (β1, β2) = (0.5, 0.999). ... We adopt a progressive training technique, which gradually increases the length of training sequences. Initially, we set T = 1, which means the network only generates single frames. After that, we double the sequence length (T) for every few epochs. |