Learning to Navigate in Complex Environments

Authors: Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach using five 3D maze environments and demonstrate the accelerated learning and increased performance of the proposed agent architecture. These environments feature complex geometry, random start position and orientation, dynamic goal locations, and long episodes that require thousands of agent steps (see Figure 1).
Researcher Affiliation Industry Piotr Mirowski , Razvan Pascanu , Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell Deep Mind London, UK {piotrmirowski, razp, fviola, soyer, aybd, abanino, mdenil, goroshin, sifre, korayk, dkumaran, raia} @google.com
Pseudocode No The paper describes network architectures and training details but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states: 'The environments used in this paper are publicly available at https://github.com/deepmind/lab.' This refers to the environment, not the source code for the methodology/agent described in the paper.
Open Datasets Yes We consider a set of first-person 3D mazes from the Deep Mind Lab environment (Beattie et al., 2016) (see Fig. 1)... The environments used in this paper are publicly available at https://github.com/deepmind/lab.
Dataset Splits No The paper describes environment dimensions and episode lengths but does not explicitly provide training/validation/test dataset splits. It mentions '100 test episodes' but no formal split percentages or sample counts for training and validation.
Hardware Specification No The paper mentions 'We use 16 workers' but does not specify any particular hardware (GPU, CPU, etc.) used for running the experiments.
Software Dependencies No The paper mentions algorithms like A3C and RMSProp but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, CUDA 11.x).
Experiment Setup Yes Learning rate was sampled from [10-4, 5 10-4]. Strength of the entropy regularization from [10-4, 10-3]. ... Gradients are computed over non-overlaping chunks of 50 or 75 steps of the episode. The auxiliary tasks, when used, have hyperparameters sampled from: Coefficient βd of the depth prediction loss from convnet features Ld sampled from {3.33, 10, 33}. Coefficient β d of the depth prediction loss from LSTM hiddens Ld sampled from {1, 3.33, 10}. Coefficient βl of the loop closure prediction loss Ll sampled from {1, 3.33, 10}.