reproducibilityindex.ai

Emergence of Maps in the Memories of Blind Navigation Agents

Authors: Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Despite these harsh conditions, we ﬁnd that blind agents are (1) surprisingly effective navigators in new environments ( 95% success); (2) they utilize memory over long horizons (remembering 1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent forgets exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising ﬁnding, an insight, and an explanation.
Researcher Affiliation	Collaboration	Erik Wijmans1,2 Manolis Savva2,3 Irfan Essa1,4 Stefan Lee5 Ari S. Morcos2 Dhruv Batra1,2 1Georgia Institute of Technology 2FAIR, Meta AI 3Simon Fraser University 4Google Research Atlanta 5Oregon State University
Pseudocode	No	The paper describes the architecture and training procedures in detail within the text (e.g., Section A.1) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	Our analysis code will be open-sourced.
Open Datasets	Yes	We train navigation agents for Point Goal Nav in virtualized 3D replicas of real houses utilizing the AI Habitat simulator (Savva et al., 2019; Szot et al., 2021) and Gibson (Xia et al., 2018) and Matterport3D (Chang et al., 2017) datasets.
Dataset Splits	No	The paper mentions using a 'validation dataset' for early stopping in several sections (e.g., A.3, A.4, A.5), for instance, 'We use the validation dataset to perform early-stopping.' However, it does not provide specific details on the split percentages or sample counts for this validation set.
Hardware Specification	No	The paper mentions using '16 GPUs' for training ('We use Decentralized Distributed PPO (DD-PPO) (Wijmans et al., 2020) to train on 16 GPUs.'), but it does not specify the make, model, or any other details of these GPUs or any other hardware components (e.g., CPUs, memory).
Software Dependencies	No	The paper mentions various software components and algorithms such as 'LSTM', 'DD-PPO', 'Adam optimizer', 'PPO', 'GAE', 'Coord Conv', 'Res Net50', 'Focal Loss', and 'Huber Loss', along with citations for some. However, it does not provide specific version numbers for any of these software dependencies or for the underlying programming languages/frameworks (e.g., Python, PyTorch).
Experiment Setup	Yes	We use the Adam optimize (Kingma & Ba, 2015) with a learning rate of 2.5 10 4. We set the discount factor γ to 0.99, the PPO clip to 0.2, and the GAE hyper-parameter τ to 0.95. We train until convergence (around 2 billion steps of experience).