reproducibilityindex.ai

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

Authors: Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks the analog of Image Net pre-training + task-speciﬁc ﬁne-tuning for embodied AI.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology 2Facebook AI Research 3Oregon State University 4Simon Fraser University
Pseudocode	Yes	See Fig. 9 for an example implementation which adds 1) gradient synchronization via torch.nn.parallel.Distributed Data Parallel, and 2) preempts stragglers by tracking the number of workers have ﬁnished the experience collection stage with a torch.distributed.TCPStore.
Open Source Code	Yes	Our model outperforms Image Net pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available). Code: https://github.com/facebookresearch/habitat-api
Open Datasets	Yes	First, we utilize the training data released as part of the Habitat Challenge 2019, consisting of 72 scenes from the Gibson dataset (Xia et al., 2018). We then augment this with all 90 scenes in the Matterport3D dataset (Chang et al., 2017) to create a larger training set (note that Matterport3D meshes tend to be larger and of better quality).
Dataset Splits	Yes	Table 1: Performance (higher is better) of different architectures for agents with RGB-D and GPS+Compass sensors on the Habitat Challenge 2019 (Savva et al., 2019) validation and test-std splits (checkpoint selected on val).
Hardware Specification	Yes	We benchmark training our Res Net50 Point Goal Nav agent with Depth on a cluster with Nvidia V100 GPUs and NCCL2.4.7 with Inﬁniband interconnect.
Software Dependencies	Yes	We leverage Py Torch s (Paszke et al., 2017) Distributed Data Parallel to synchronize gradients, and TCPStore a simple distributed key-value storage to track how many workers have ﬁnished collecting experience. See Apx. E for a detailed description with code. See Fig. 9 for an example implementation which adds 1) gradient synchronization via torch.nn.parallel.Distributed Data Parallel, and 2) preempts stragglers by tracking the number of workers have ﬁnished the experience collection stage with a torch.distributed.TCPStore. See Fig. 9 for an example implementation which adds 1) gradient synchronization via torch.nn.parallel.Distributed Data Parallel, and 2) preempts stragglers by tracking the number of workers have ﬁnished the experience collection stage with a torch.distributed.TCPStore.
Experiment Setup	Yes	Training. We use PPO with Generalized Advantage Estimation (Schulman et al., 2015). We set the discount factor γ to 0.99 and the GAE parameter τ to 0.95. Each worker collects (up to) 128 frames of experience from 4 agents running in parallel (all in different environments) and then performs 2 epochs of PPO with 2 mini-batches per epoch. We use Adam (Kingma & Ba, 2014) with a learning rate of 2.5 10 4. Unlike popular implementations of PPO, we do not normalize advantages as we ﬁnd this leads to instabilities. We use DD-PPO to train with 64 workers on 64 GPUs.