Emergence of Maps in the Memories of Blind Navigation Agents
Authors: Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments ( 95% success); (2) they utilize memory over long horizons (remembering 1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent forgets exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation. |
| Researcher Affiliation | Collaboration | Erik Wijmans1,2 Manolis Savva2,3 Irfan Essa1,4 Stefan Lee5 Ari S. Morcos2 Dhruv Batra1,2 1Georgia Institute of Technology 2FAIR, Meta AI 3Simon Fraser University 4Google Research Atlanta 5Oregon State University |
| Pseudocode | No | The paper describes the architecture and training procedures in detail within the text (e.g., Section A.1) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our analysis code will be open-sourced. |
| Open Datasets | Yes | We train navigation agents for Point Goal Nav in virtualized 3D replicas of real houses utilizing the AI Habitat simulator (Savva et al., 2019; Szot et al., 2021) and Gibson (Xia et al., 2018) and Matterport3D (Chang et al., 2017) datasets. |
| Dataset Splits | No | The paper mentions using a 'validation dataset' for early stopping in several sections (e.g., A.3, A.4, A.5), for instance, 'We use the validation dataset to perform early-stopping.' However, it does not provide specific details on the split percentages or sample counts for this validation set. |
| Hardware Specification | No | The paper mentions using '16 GPUs' for training ('We use Decentralized Distributed PPO (DD-PPO) (Wijmans et al., 2020) to train on 16 GPUs.'), but it does not specify the make, model, or any other details of these GPUs or any other hardware components (e.g., CPUs, memory). |
| Software Dependencies | No | The paper mentions various software components and algorithms such as 'LSTM', 'DD-PPO', 'Adam optimizer', 'PPO', 'GAE', 'Coord Conv', 'Res Net50', 'Focal Loss', and 'Huber Loss', along with citations for some. However, it does not provide specific version numbers for any of these software dependencies or for the underlying programming languages/frameworks (e.g., Python, PyTorch). |
| Experiment Setup | Yes | We use the Adam optimize (Kingma & Ba, 2015) with a learning rate of 2.5 10 4. We set the discount factor γ to 0.99, the PPO clip to 0.2, and the GAE hyper-parameter τ to 0.95. We train until convergence (around 2 billion steps of experience). |