HAF-SVG: Hierarchical Stochastic Video Generation with Aligned Features
Authors: Zhihui Lin, Chun Yuan, Maomao Li
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Moving MNIST, BAIR, and KTH datasets demonstrate that hierarchical structure is helpful for modeling more accurate future uncertainty, and the feature aligner is beneficial to generate realistic frames. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technologies, Tsinghua University, Beijing, China 2Graduate School at Shenzhen, Tsinghua University, Shenzhen, China 3Peng Cheng Laboratory, Shenzhen, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the source code. |
| Open Datasets | Yes | We perform experiments on synthetic sequences (Moving MNIST [Srivastava et al., 2015]), as well as real-world videos (KTH action [Schuldt et al., 2004] and BAIR robot [Ebert et al., 2017]). |
| Dataset Splits | No | The paper mentions aspects of training and testing, such as 'Each training sequence consists of 15 consecutive frames, 5 for the input and 10 for the prediction' and 'For each sequence, 100 predictions are sampled and one with the best score with respect to the ground-truth', but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing as distinct sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using VGG16 and DCGAN architectures, and LSTM layers, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We adopt the experiment setup in SVG [Denton and Fergus, 2018], where frames are all resized into 64 64. LSTMθ is implemented by a two-layer LSTMs with 256 cells in each layer while LSTMφj and LSTMψj are single-layer LSTMs with 256 cells. The output dimensionalities of the LSTM networks are 128 and |ht| = 128 for all three datasets. For KTH and BAIR, the encoder E adopts the VGG16 [Simonyan and Zisserman, 2015] architecture, and the frame decoder D is the mirrored version of the encoder. |µφj| = |µψj| are set to 24 on KTH, while 64 on BAIR. For Moving-MNIST, we adopt the DCGAN discriminator architecture [Radford et al., 2016] as our E, the DCGAN generator architecture as D, and |µφj| = |µψj| = 16. Besides, we use β=1e-4 for Moving MNIST and BAIR and β=1e-6 for KTH. |