Action-Conditional Video Prediction using Deep Networks in Atari Games
Authors: Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, Satinder Singh
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs. |
| Researcher Affiliation | Academia | University of Michigan, Ann Arbor, MI 48109, USA |
| Pseudocode | No | The paper describes the architectures and training method using text and mathematical equations, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "Our implementation is based on Caffe toolbox [13]." However, it does not explicitly state that the authors' own implementation code for the described methodology is publicly available, nor does it provide a direct link to it. The provided link is for prediction videos. |
| Open Datasets | No | We used our replication of DQN to generate game-play video datasets using an ϵ-greedy policy with ϵ = 0.3. For each game, the dataset consists of about 500,000 training frames and 50,000 test frames with actions chosen by DQN. The authors generated their own dataset and describe its size but do not provide access information (link, DOI, citation for public repository) to make it publicly available. |
| Dataset Splits | No | The paper mentions "500,000 training frames and 50,000 test frames" but does not explicitly state a separate validation split or dataset. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments. It only mentions the software framework used: "Our implementation is based on Caffe toolbox [13]." |
| Software Dependencies | No | The paper mentions "Caffe toolbox [13]" as the basis of their implementation, but it does not specify a version number for Caffe or any other software dependencies. |
| Experiment Setup | Yes | We use the curriculum learning scheme above with three phases of increasing prediction step objectives of 1, 3 and 5 steps, and learning rates of 10 4, 10 5, and 10 5, respectively. RMSProp [34, 10] is used with momentum of 0.9, (squared) gradient momentum of 0.95, and min squared gradient of 0.01. The batch size for each training phase is 32, 8, and 8 for the feedforward encoding network and 4, 4, and 4 for the recurrent encoding network, respectively. When the recurrent encoding network is trained on 1-step prediction objective, the network is unrolled through 20 steps and predicts the last 10 frames by taking ground-truth images as input. Gradients are clipped at [ 0.1, 0.1] before non-linearity of each gate of LSTM as suggested by [10]. |