reproducibilityindex.ai

Learning to Act by Predicting the Future

Authors: Alexey Dosovitskiy, Vladlen Koltun

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.
Researcher Affiliation	Industry	Alexey Dosovitskiy Intel Labs Vladlen Koltun Intel Labs
Pseudocode	No	The paper describes the network structure and training process in text and diagrams, but it does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper mentions using implementations of baseline methods (DQN, DSR, A3C) from other authors but does not state that its own methodology's source code is openly available.
Open Datasets	No	The paper uses scenarios built within the ViZDoom platform and custom-designed environments (D3, D4, D3-tx, D4-tx). It does not provide specific access information or links to these datasets for public use.
Dataset Splits	No	The paper describes training and testing phases where the agent collects experiences, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts in the traditional sense of a static dataset.
Hardware Specification	No	The paper mentions that the simulation runs 'on a single CPU core' but does not specify exact CPU models, GPU types, or other detailed hardware specifications used for training and experimentation.
Software Dependencies	No	The paper mentions the ViZDoom platform and the Adam optimization algorithm, and links to GitHub repositories for baseline implementations (DQN, DSR, A3C). However, it does not provide specific version numbers for software dependencies or libraries used in their own implementation.
Experiment Setup	Yes	We set the temporal offsets τ1, ..., τn of predicted future measurements to 1, 2, 4, 8, 16, and 32 steps in all experiments. Only the latest three time steps contribute to the objective function, with coefficients (0.5, 0.5, 1). We maintain an experience memory of the M = 20,000 steps, and sampled a mini-batch of N = 64 samples after every k = 64 new experiences added. The networks in all experiments were trained using the Adam algorithm (Kingma & Ba, 2015) with β1 = 0.95, β2 = 0.999, and ε = 10−4. The initial learning rate is set to 10−4 and is gradually decreased during training.