Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
Authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we demonstrate the capability of scaling up Q-learning methods to tens of thousands of parallel environments and investigate important factors that can affect learning speed, including the number of parallel environments, exploration strategies, batch size, GPU models, etc. The code is available at https://github.com/Improbable-AI/pql. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, USA. Correspondence to: Zechu Li <zechu@mit.edu>, Tao Chen <taochen@mit.edu>, Pulkit Agrawal <pulkitag@mit.edu>. |
| Pseudocode | Yes | We use Ray (Moritz et al., 2017) for parallelization. The pseudo-code for the scheme is shown in Algorithm 1, 2, and 3 in the appendix. |
| Open Source Code | Yes | The code is available at https://github.com/Improbable-AI/pql. |
| Open Datasets | Yes | Tasks We evaluate our method on six Isaac Gym benchmark tasks (Makoviychuk et al., 2021): Ant, Humanoid, ANYmal, Shadow Hand, Allegro Hand, and Franka Cube Stacking (see Figure 2). For more details about these tasks, please refer to (Makoviychuk et al., 2021). |
| Dataset Splits | No | The paper uses reinforcement learning tasks where data is generated through interaction with environments. It evaluates performance during training but does not describe static training/validation/test dataset splits in the typical sense of a fixed dataset. |
| Hardware Specification | Yes | Hardware We use NVIDIA Ge Force RTX 3090 GPUs as our default GPUs for the experiments unless otherwise specified. More details are shown in Table B.3 in the appendix. Table B.3. Hareware configurations on different workstations Workstation 1 CPU AMD Threadripper 3990X GPU Ge Force RTX 3090... Workstation 2 CPU Intel Xeon Gold 6248 GPU Tesla V100... Workstation 3 CPU AMD Rome 7742 GPU Tesla A100... Workstation 4 CPU Intel Xeon W-2195 GPU Ge Force RTX 2080 Ti... |
| Software Dependencies | No | We use Ray (Moritz et al., 2017) for parallelization. We use Isaac Gym (Makoviychuk et al., 2021) as our simulation engine... Image data is compressed using the lz4 library to reduce the bandwidth requirement and communication overhead. |
| Experiment Setup | Yes | Table B.1. Hyper-parameter setup for six Isaac Gym benchmark tasks Hyper-parameter PQL(ours) DDPG SAC Num. Environments 4,096 4,096 4,096 Critic Learning Rate 5 10 4 5 10 4 5 10 4 Actor Learning Rate 5 10 4 5 10 4 5 10 4 ... Batch Size 8,192 8,192 8,192 Num. Epochs (βa:v) 8 8 8 Discount Factor(γ) 0.99 0.99 0.99 ... Replay Buffer Size 5 106 5 106 5 106 |