High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks
Authors: Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving. |
| Researcher Affiliation | Collaboration | Ruben Villegas1,4 Arkanath Pathak3 Harini Kannan2 Dumitru Erhan2 Quoc V. Le2 Honglak Lee2 1 University of Michigan 2 Google Research 3 Google 4 Adobe Research |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the source code for the methodology is openly available, nor does it provide a direct link to a code repository. It mentions a website for qualitative results and supplementary material for other details, but not code. |
| Open Datasets | Yes | We use the action-conditioned towel pick dataset from Ebert et al. [2018]... We use the Human 3.6M dataset [Ionescu et al., 2014]... We use the KITTI driving dataset [Geiger et al., 2013] |
| Dataset Splits | Yes | We use the train/test split from Villegas et al. [2017b]... We use the train/test split from Lotter et al. [2017] in our experiments. |
| Hardware Specification | No | Details of the devices we use to scale up computation can be found in the supplementary material. The main paper does not provide specific hardware details for the experiments. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | To increase the capacity of our baseline model, we use hyperparameters K and M, which denote the factors by which the number of neurons in each layer of the encoder, decoder and LSTMs are increased. In our experiments we increase both K and M together until we reach the device limits. Due to the LSTM having more parameters, we stop increasing the capacity of the LSTM at M = 3 but continue to increase K up to 5. During training time, the models are conditioned on 2 input frames and predict 10 frames into the future. During test time, the models predict 18 frames into the future. |