High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

Authors: Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.
Researcher Affiliation Collaboration Ruben Villegas1,4 Arkanath Pathak3 Harini Kannan2 Dumitru Erhan2 Quoc V. Le2 Honglak Lee2 1 University of Michigan 2 Google Research 3 Google 4 Adobe Research
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that the source code for the methodology is openly available, nor does it provide a direct link to a code repository. It mentions a website for qualitative results and supplementary material for other details, but not code.
Open Datasets Yes We use the action-conditioned towel pick dataset from Ebert et al. [2018]... We use the Human 3.6M dataset [Ionescu et al., 2014]... We use the KITTI driving dataset [Geiger et al., 2013]
Dataset Splits Yes We use the train/test split from Villegas et al. [2017b]... We use the train/test split from Lotter et al. [2017] in our experiments.
Hardware Specification No Details of the devices we use to scale up computation can be found in the supplementary material. The main paper does not provide specific hardware details for the experiments.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes To increase the capacity of our baseline model, we use hyperparameters K and M, which denote the factors by which the number of neurons in each layer of the encoder, decoder and LSTMs are increased. In our experiments we increase both K and M together until we reach the device limits. Due to the LSTM having more parameters, we stop increasing the capacity of the LSTM at M = 3 but continue to increase K up to 5. During training time, the models are conditioned on 2 input frames and predict 10 frames into the future. During test time, the models predict 18 frames into the future.