reproducibilityindex.ai

Large Batch Simulation for Deep Reinforcement Learning

Authors: Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine.
Researcher Affiliation	Collaboration	1Stanford University 2Georgia Institute of Technology 3Intel Labs 4University of Southern California 5Simon Fraser University
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	To facilitate such efforts, our system is available open-source at https://github.com/shacklettbp/bps-nav.
Open Datasets	Yes	Depth agents are trained on Gibson-2plus (Xia et al., 2018) and, consistent with Wijmans et al. (2020), RGB agents are also trained on Matterport3D (Chang et al., 2017).
Dataset Splits	Yes	Agents are evaluated on the Gibson dataset (Xia et al., 2018). We use two metrics: Success, whether or not the agent reached the goal, and SPL (Anderson et al., 2018), a measure of both Success and efﬁciency of the agent s path. We perform policy evaluation using Habitat-Sim (Savva et al., 2019), unmodiﬁed for direct comparability to prior work. ... Table 2: Policy performance. SPL and Success of agents produced by BPS and WIJMANS20. The performance of the BPS agent is within the margin of error of the WIJMANS20 agent for Depth experiments on the validation set, and within ﬁve percent on RGB. BPS agents are trained on eight GPUs with aggregate batch size N=1024.
Hardware Specification	Yes	Results are reported across three models of NVIDIA GPUs: Tesla V100, Ge Force RTX 2080Ti, and Ge Force RTX 3090. (The different GPUs are also accompanied by different CPUs, see Appendix C.) ... Tesla V100 benchmarking was done with 2x Intel Xeon E5-2698 v4 (a DGX-1 station). RTX 2080 TI benchmarking was done with 2x Intel Xeon Gold 6226. RTX 3090 benchmarking was done with with 1x Intel i7-5820k.
Software Dependencies	No	The paper mentions software like 'Habitat-Sim' and the 'Apex library in O2 mode', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	To further accelerate the policy DNN workload, BPS uses half-precision inference and mixed-precision training. ... BPS uses the largest batch size that ﬁts in GPU memory, subject to the constraint that no one scene asset can be shared by more than 32 environments in the batch. ... BPS limits per GPU batch size to N=128, with K=4 active scenes per GPU. ... Table A4: Hyper-parameters used for BPS training on 8 GPUs. PPO Epochs 1, PPO Mini-Batches 2, PPO Clip 0.2, Learning rate 5.0 10 4 Depth, 2.5 10 4 RGB, Number of Environments (N) 128, Rollout length (L) 32.