Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Authors: Sihyun Yu, Jihoon Tack, Sangwoo Mo, Hyunsu Kim, Junho Kim, Jung-Woo Ha, Jinwoo Shin

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the superiority of DIGAN under various datasets, along with multiple intriguing properties, e.g., long video synthesis, video extrapolation, and non-autoregressive video generation. For example, DIGAN improves the previous state-of-the-art FVD score on UCF-101 by 30.7% and can be trained on 128 frame videos of 128 128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method. We present the setups and main video generation results in Section 4.1. We then exhibit the intriguing properties of DIGAN in Section 4.2. Finally, we conduct ablation studies in Section 4.3.
Researcher Affiliation Collaboration 1Korea Advanced Institute of Science and Technology (KAIST), 2NAVER AI Lab {sihyun.yu, jihoontack, swmo, jinwoos}@kaist.ac.kr, {hyunsu1125.kim, jhkim.ai, jungwoo.ha}@navercorp.com
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We also provide our code in the supplementary material.
Open Datasets Yes We conduct the experiments on UCF-101 (Soomro et al., 2012), Tai-Chi HD (Tai Chi; Siarohin et al. (2019)), Sky Time-lapse (Sky; Xiong et al. (2018)), and a food class subset of Kinetics-600 (Kinetics-food; Carreira et al. (2018)) datasets. ... Tai-Chi-HD (Siarohin et al., 2019) is a video dataset total of 280 long videos of people doing Tai-Chi. We use the official link for downloading and cropping the dataset as the rectangle videos of 128 128 resolution.5 https://github.com/Aliaksandr Siarohin/first-order-model ... Sky Time-lapse (Xiong et al., 2018) is a collection of sky time-lapse total of 5,000 videos. We use the same data pre-processing following the official link.6 https://github.com/weixiong-ur/mdgan
Dataset Splits Yes UCF-101 ... We conducted two different experiments for a fair comparison: training the model with the train split of 9,357 videos or with all 13,320 videos without the split (following the setup of prior state-of-the-art baselines (Clark et al., 2019; Tian et al., 2021)). ... Tai-Chi-HD ... We use all data without the split on training. ... Sky Time-lapse ... We use the train split for training the model and test split for the evaluation, following the setups in prior works. ... Kinetics-600 ... We use train split for the model training and use the validation set for the evaluation.
Hardware Specification Yes With these setups, all the experiments are processed with 4 NVIDIA V100 32GB GPUs where it takes at most 4.4 days to complete. ... We used Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz and a Titan XP GPU for the measurement. ... All experiments are performed under the same machine (NVIDIA V100 32GB GPUs).
Software Dependencies No The paper mentions software like Style GAN2, INR-GAN, Diff Aug, but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup Yes We set the spatial frequencies σx = σy = 10 and Style GAN2 (Karras et al., 2020b) discriminator. We use a small temporal frequency σt = 0.25 for all experiments ff We use the same discriminator of INR-GAN for both image and motion discriminators but only differ for the input channels: 3 and 7. ... We also apply Diff Aug (Zhao et al., 2020) to mitigate overfitting from the limited number of videos like in Tian et al. (2021). ... All other hyperparameters are identical to Style GAN2 except for R1 regularization coefficient γ: we use γ = 1 in all experiments.