reproducibilityindex.ai

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

Authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we demonstrate the capability of scaling up Q-learning methods to tens of thousands of parallel environments and investigate important factors that can affect learning speed, including the number of parallel environments, exploration strategies, batch size, GPU models, etc. The code is available at https://github.com/Improbable-AI/pql.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, USA. Correspondence to: Zechu Li <zechu@mit.edu>, Tao Chen <taochen@mit.edu>, Pulkit Agrawal <pulkitag@mit.edu>.
Pseudocode	Yes	We use Ray (Moritz et al., 2017) for parallelization. The pseudo-code for the scheme is shown in Algorithm 1, 2, and 3 in the appendix.
Open Source Code	Yes	The code is available at https://github.com/Improbable-AI/pql.
Open Datasets	Yes	Tasks We evaluate our method on six Isaac Gym benchmark tasks (Makoviychuk et al., 2021): Ant, Humanoid, ANYmal, Shadow Hand, Allegro Hand, and Franka Cube Stacking (see Figure 2). For more details about these tasks, please refer to (Makoviychuk et al., 2021).
Dataset Splits	No	The paper uses reinforcement learning tasks where data is generated through interaction with environments. It evaluates performance during training but does not describe static training/validation/test dataset splits in the typical sense of a fixed dataset.
Hardware Specification	Yes	Hardware We use NVIDIA Ge Force RTX 3090 GPUs as our default GPUs for the experiments unless otherwise specified. More details are shown in Table B.3 in the appendix. Table B.3. Hareware configurations on different workstations Workstation 1 CPU AMD Threadripper 3990X GPU Ge Force RTX 3090... Workstation 2 CPU Intel Xeon Gold 6248 GPU Tesla V100... Workstation 3 CPU AMD Rome 7742 GPU Tesla A100... Workstation 4 CPU Intel Xeon W-2195 GPU Ge Force RTX 2080 Ti...
Software Dependencies	No	We use Ray (Moritz et al., 2017) for parallelization. We use Isaac Gym (Makoviychuk et al., 2021) as our simulation engine... Image data is compressed using the lz4 library to reduce the bandwidth requirement and communication overhead.
Experiment Setup	Yes	Table B.1. Hyper-parameter setup for six Isaac Gym benchmark tasks Hyper-parameter PQL(ours) DDPG SAC Num. Environments 4,096 4,096 4,096 Critic Learning Rate 5 10 4 5 10 4 5 10 4 Actor Learning Rate 5 10 4 5 10 4 5 10 4 ... Batch Size 8,192 8,192 8,192 Num. Epochs (βa:v) 8 8 8 Discount Factor(γ) 0.99 0.99 0.99 ... Replay Buffer Size 5 106 5 106 5 106