reproducibilityindex.ai

Hierarchical Long-term Video Prediction without Supervision

Authors: Nevan wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset. 5. Experiments We evaluated our methods on two datasets: the Human 3.6M dataset (Ionescu et al., 2014; 2011), and a toy dataset based on videos of bouncing shapes.
Researcher Affiliation	Collaboration	1Google Brain, Mountain View, CA, USA. 2Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our method code is available at https://bit.ly/2HqiHqx and More sample videos and code to reproduce our results are available at our project website https://bit.ly/2kS8r16.
Open Datasets	Yes	We evaluated our methods on two datasets: the Human 3.6M dataset (Ionescu et al., 2014; 2011), and a toy dataset based on videos of bouncing shapes.
Dataset Splits	Yes	In these experiments, we use subjects 1, 5, 6, 7, and 8 for training, and subject 9 for validation. Subject 11 results are reported in this paper for testing.
Hardware Specification	No	The paper mentions running on a 'GPU' (e.g., 'as large as we could fit in the GPU') but does not provide specific details such as model, memory, or CPU specifications.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or frameworks used (e.g., Python, TensorFlow, PyTorch, CUDA versions).
Experiment Setup	Yes	We use 64 by 64 images, and subsample the dataset to 6.25 frames per second. We train the methods to predict 32 frames and the results in this paper show predictions over 126 frames. Each method is given the ﬁrst ﬁve frames as context. ... We use an encoding dimension of 64... The encoder in the EPVA method is initialized with the VGG network... α starting small, around 1e-7, and gradually increased to around 0.1 during training. ... We performed grid search on the β and learning rate to ﬁnd the best conﬁguration... For Finn et al. (2016), we performed grid search on the learning rate.