Hierarchical Video Generation From Orthogonal Information: Optical Flow and Texture

Authors: Katsunori Ohnishi, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our model generates more plausible motion videos and also achieves significantly improved performance for unsupervised action classification in comparison to previous GAN works.
Researcher Affiliation Academia Katsunori Ohnishi The University of Tokyo ohnishi@mi.t.u-tokyo.ac.jp Shohei Yamamoto The University of Tokyo yamamoto@mi.t.u-tokyo.ac.jp Yoshitaka Ushiku The University of Tokyo ushiku@mi.t.u-tokyo.ac.jp Tatsuya Harada The University of Tokyo RIKEN harada@mi.t.u-tokyo.ac.jp
Pseudocode No The paper describes the model architecture and processes in text and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes For evaluating video generation, we conduct experiments on two video datasets of human actions. For the real world human video dataset, we use Penn Action (Zhang, Zhu, and Derpanis 2013), which has 2326 videos of 15 different classes and 163841 frames. ... For the Computer Graphics (CG) human video dataset, we use SURREAL (Varol et al. 2017), which is made by synthesizing CG humans and LSUN (Yu et al. 2015), images, and consists of 67582 videos. ... Table 2 shows the action classification performance on UCF101 (Soomro, Zamir, and Shah 2012).
Dataset Splits Yes We employ the original train/test split. (for Penn Action). In the original train/test splits, even test videos have 12538 videos, which contain 1194662 frames. Thus, we use original test videos for training and 1659 subset videos from the original train for testing. (for SURREAL)
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No First, we train the Flow GAN and Texture GAN independently by using the Adam (Kingma and Ba 2014) optimizer... For optical flow computation, we use Epic flow (Revaud et al. 2015). ... For optical flow estimation, we employ the algorithm proposed by Brox et al. (Brox et al. 2004) following Two-stream (Simonyan and Zisserman 2014). No version numbers are given for any software dependencies.
Experiment Setup Yes First, we train the Flow GAN and Texture GAN independently by using the Adam (Kingma and Ba 2014) optimizer with an initial learning rate α = 0.0002 and momentum parameter β1 = 0.5. The learning rate is decayed to 1/2 from its previous value six times during the training. The latent variables ztex and zflow are Gaussian distributions with 100 dimensions. We set a batch size of 32. Batch normalization (Ioffe and Szegedy 2015) and Rectified Liner Unit (Re LU) activation are applied after every up-sampling convolution, except for the last layer. For down-sampling convolutions, we also apply batch normalization and Leaky Re LU (Xu et al. 2015) to all layers but only apply batch normalization to the first layer. After training them independently, we join both networks and train our FTGAN full-network. Following S2GAN, we set a small learning rate α = 1e 7 for the Flow GAN and set α = 1e 6 for the Texture GAN during the joint learning.