Learning to Act by Predicting the Future
Authors: Alexey Dosovitskiy, Vladlen Koltun
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments. |
| Researcher Affiliation | Industry | Alexey Dosovitskiy Intel Labs Vladlen Koltun Intel Labs |
| Pseudocode | No | The paper describes the network structure and training process in text and diagrams, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper mentions using implementations of baseline methods (DQN, DSR, A3C) from other authors but does not state that its own methodology's source code is openly available. |
| Open Datasets | No | The paper uses scenarios built within the ViZDoom platform and custom-designed environments (D3, D4, D3-tx, D4-tx). It does not provide specific access information or links to these datasets for public use. |
| Dataset Splits | No | The paper describes training and testing phases where the agent collects experiences, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts in the traditional sense of a static dataset. |
| Hardware Specification | No | The paper mentions that the simulation runs 'on a single CPU core' but does not specify exact CPU models, GPU types, or other detailed hardware specifications used for training and experimentation. |
| Software Dependencies | No | The paper mentions the ViZDoom platform and the Adam optimization algorithm, and links to GitHub repositories for baseline implementations (DQN, DSR, A3C). However, it does not provide specific version numbers for software dependencies or libraries used in their own implementation. |
| Experiment Setup | Yes | We set the temporal offsets τ1, ..., τn of predicted future measurements to 1, 2, 4, 8, 16, and 32 steps in all experiments. Only the latest three time steps contribute to the objective function, with coefficients (0.5, 0.5, 1). We maintain an experience memory of the M = 20,000 steps, and sampled a mini-batch of N = 64 samples after every k = 64 new experiences added. The networks in all experiments were trained using the Adam algorithm (Kingma & Ba, 2015) with β1 = 0.95, β2 = 0.999, and ε = 10−4. The initial learning rate is set to 10−4 and is gradually decreased during training. |