reproducibilityindex.ai

Hierarchical Video Generation From Orthogonal Information: Optical Flow and Texture

Authors: Katsunori Ohnishi, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our model generates more plausible motion videos and also achieves signiﬁcantly improved performance for unsupervised action classiﬁcation in comparison to previous GAN works.
Researcher Affiliation	Academia	Katsunori Ohnishi The University of Tokyo ohnishi@mi.t.u-tokyo.ac.jp Shohei Yamamoto The University of Tokyo yamamoto@mi.t.u-tokyo.ac.jp Yoshitaka Ushiku The University of Tokyo ushiku@mi.t.u-tokyo.ac.jp Tatsuya Harada The University of Tokyo RIKEN harada@mi.t.u-tokyo.ac.jp
Pseudocode	No	The paper describes the model architecture and processes in text and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	For evaluating video generation, we conduct experiments on two video datasets of human actions. For the real world human video dataset, we use Penn Action (Zhang, Zhu, and Derpanis 2013), which has 2326 videos of 15 different classes and 163841 frames. ... For the Computer Graphics (CG) human video dataset, we use SURREAL (Varol et al. 2017), which is made by synthesizing CG humans and LSUN (Yu et al. 2015), images, and consists of 67582 videos. ... Table 2 shows the action classiﬁcation performance on UCF101 (Soomro, Zamir, and Shah 2012).
Dataset Splits	Yes	We employ the original train/test split. (for Penn Action). In the original train/test splits, even test videos have 12538 videos, which contain 1194662 frames. Thus, we use original test videos for training and 1659 subset videos from the original train for testing. (for SURREAL)
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	First, we train the Flow GAN and Texture GAN independently by using the Adam (Kingma and Ba 2014) optimizer... For optical ﬂow computation, we use Epic ﬂow (Revaud et al. 2015). ... For optical ﬂow estimation, we employ the algorithm proposed by Brox et al. (Brox et al. 2004) following Two-stream (Simonyan and Zisserman 2014). No version numbers are given for any software dependencies.
Experiment Setup	Yes	First, we train the Flow GAN and Texture GAN independently by using the Adam (Kingma and Ba 2014) optimizer with an initial learning rate α = 0.0002 and momentum parameter β1 = 0.5. The learning rate is decayed to 1/2 from its previous value six times during the training. The latent variables ztex and zflow are Gaussian distributions with 100 dimensions. We set a batch size of 32. Batch normalization (Ioffe and Szegedy 2015) and Rectiﬁed Liner Unit (Re LU) activation are applied after every up-sampling convolution, except for the last layer. For down-sampling convolutions, we also apply batch normalization and Leaky Re LU (Xu et al. 2015) to all layers but only apply batch normalization to the ﬁrst layer. After training them independently, we join both networks and train our FTGAN full-network. Following S2GAN, we set a small learning rate α = 1e 7 for the Flow GAN and set α = 1e 6 for the Texture GAN during the joint learning.