reproducibilityindex.ai

Action-Conditional Video Prediction using Deep Networks in Atari Games

Authors: Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, Satinder Singh

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the ﬁrst to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs.
Researcher Affiliation	Academia	University of Michigan, Ann Arbor, MI 48109, USA
Pseudocode	No	The paper describes the architectures and training method using text and mathematical equations, but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "Our implementation is based on Caffe toolbox [13]." However, it does not explicitly state that the authors' own implementation code for the described methodology is publicly available, nor does it provide a direct link to it. The provided link is for prediction videos.
Open Datasets	No	We used our replication of DQN to generate game-play video datasets using an ϵ-greedy policy with ϵ = 0.3. For each game, the dataset consists of about 500,000 training frames and 50,000 test frames with actions chosen by DQN. The authors generated their own dataset and describe its size but do not provide access information (link, DOI, citation for public repository) to make it publicly available.
Dataset Splits	No	The paper mentions "500,000 training frames and 50,000 test frames" but does not explicitly state a separate validation split or dataset.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments. It only mentions the software framework used: "Our implementation is based on Caffe toolbox [13]."
Software Dependencies	No	The paper mentions "Caffe toolbox [13]" as the basis of their implementation, but it does not specify a version number for Caffe or any other software dependencies.
Experiment Setup	Yes	We use the curriculum learning scheme above with three phases of increasing prediction step objectives of 1, 3 and 5 steps, and learning rates of 10 4, 10 5, and 10 5, respectively. RMSProp [34, 10] is used with momentum of 0.9, (squared) gradient momentum of 0.95, and min squared gradient of 0.01. The batch size for each training phase is 32, 8, and 8 for the feedforward encoding network and 4, 4, and 4 for the recurrent encoding network, respectively. When the recurrent encoding network is trained on 1-step prediction objective, the network is unrolled through 20 steps and predicts the last 10 frames by taking ground-truth images as input. Gradients are clipped at [ 0.1, 0.1] before non-linearity of each gate of LSTM as suggested by [10].