Hierarchical Long-term Video Prediction without Supervision
Authors: Nevan wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset. 5. Experiments We evaluated our methods on two datasets: the Human 3.6M dataset (Ionescu et al., 2014; 2011), and a toy dataset based on videos of bouncing shapes. |
| Researcher Affiliation | Collaboration | 1Google Brain, Mountain View, CA, USA. 2Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our method code is available at https://bit.ly/2HqiHqx and More sample videos and code to reproduce our results are available at our project website https://bit.ly/2kS8r16. |
| Open Datasets | Yes | We evaluated our methods on two datasets: the Human 3.6M dataset (Ionescu et al., 2014; 2011), and a toy dataset based on videos of bouncing shapes. |
| Dataset Splits | Yes | In these experiments, we use subjects 1, 5, 6, 7, and 8 for training, and subject 9 for validation. Subject 11 results are reported in this paper for testing. |
| Hardware Specification | No | The paper mentions running on a 'GPU' (e.g., 'as large as we could fit in the GPU') but does not provide specific details such as model, memory, or CPU specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or frameworks used (e.g., Python, TensorFlow, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We use 64 by 64 images, and subsample the dataset to 6.25 frames per second. We train the methods to predict 32 frames and the results in this paper show predictions over 126 frames. Each method is given the first five frames as context. ... We use an encoding dimension of 64... The encoder in the EPVA method is initialized with the VGG network... α starting small, around 1e-7, and gradually increased to around 0.1 during training. ... We performed grid search on the β and learning rate to find the best configuration... For Finn et al. (2016), we performed grid search on the learning rate. |