Hierarchical Long-term Video Prediction without Supervision

Authors: Nevan wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset. 5. Experiments We evaluated our methods on two datasets: the Human 3.6M dataset (Ionescu et al., 2014; 2011), and a toy dataset based on videos of bouncing shapes.
Researcher Affiliation Collaboration 1Google Brain, Mountain View, CA, USA. 2Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our method code is available at https://bit.ly/2HqiHqx and More sample videos and code to reproduce our results are available at our project website https://bit.ly/2kS8r16.
Open Datasets Yes We evaluated our methods on two datasets: the Human 3.6M dataset (Ionescu et al., 2014; 2011), and a toy dataset based on videos of bouncing shapes.
Dataset Splits Yes In these experiments, we use subjects 1, 5, 6, 7, and 8 for training, and subject 9 for validation. Subject 11 results are reported in this paper for testing.
Hardware Specification No The paper mentions running on a 'GPU' (e.g., 'as large as we could fit in the GPU') but does not provide specific details such as model, memory, or CPU specifications.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or frameworks used (e.g., Python, TensorFlow, PyTorch, CUDA versions).
Experiment Setup Yes We use 64 by 64 images, and subsample the dataset to 6.25 frames per second. We train the methods to predict 32 frames and the results in this paper show predictions over 126 frames. Each method is given the first five frames as context. ... We use an encoding dimension of 64... The encoder in the EPVA method is initialized with the VGG network... α starting small, around 1e-7, and gradually increased to around 0.1 during training. ... We performed grid search on the β and learning rate to find the best configuration... For Finn et al. (2016), we performed grid search on the learning rate.