Large Batch Simulation for Deep Reinforcement Learning
Authors: Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Georgia Institute of Technology 3Intel Labs 4University of Southern California 5Simon Fraser University |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | To facilitate such efforts, our system is available open-source at https://github.com/shacklettbp/bps-nav. |
| Open Datasets | Yes | Depth agents are trained on Gibson-2plus (Xia et al., 2018) and, consistent with Wijmans et al. (2020), RGB agents are also trained on Matterport3D (Chang et al., 2017). |
| Dataset Splits | Yes | Agents are evaluated on the Gibson dataset (Xia et al., 2018). We use two metrics: Success, whether or not the agent reached the goal, and SPL (Anderson et al., 2018), a measure of both Success and efficiency of the agent s path. We perform policy evaluation using Habitat-Sim (Savva et al., 2019), unmodified for direct comparability to prior work. ... Table 2: Policy performance. SPL and Success of agents produced by BPS and WIJMANS20. The performance of the BPS agent is within the margin of error of the WIJMANS20 agent for Depth experiments on the validation set, and within five percent on RGB. BPS agents are trained on eight GPUs with aggregate batch size N=1024. |
| Hardware Specification | Yes | Results are reported across three models of NVIDIA GPUs: Tesla V100, Ge Force RTX 2080Ti, and Ge Force RTX 3090. (The different GPUs are also accompanied by different CPUs, see Appendix C.) ... Tesla V100 benchmarking was done with 2x Intel Xeon E5-2698 v4 (a DGX-1 station). RTX 2080 TI benchmarking was done with 2x Intel Xeon Gold 6226. RTX 3090 benchmarking was done with with 1x Intel i7-5820k. |
| Software Dependencies | No | The paper mentions software like 'Habitat-Sim' and the 'Apex library in O2 mode', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | To further accelerate the policy DNN workload, BPS uses half-precision inference and mixed-precision training. ... BPS uses the largest batch size that fits in GPU memory, subject to the constraint that no one scene asset can be shared by more than 32 environments in the batch. ... BPS limits per GPU batch size to N=128, with K=4 active scenes per GPU. ... Table A4: Hyper-parameters used for BPS training on 8 GPUs. PPO Epochs 1, PPO Mini-Batches 2, PPO Clip 0.2, Learning rate 5.0 10 4 Depth, 2.5 10 4 RGB, Number of Environments (N) 128, Rollout length (L) 32. |