RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

Authors: Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate the state-of-the-art sparse training performance of RLx2 with two popular DRL algorithms, TD3 (Fujimoto et al., 2018) and SAC (Haarnoja et al., 2018), on several Mu Jo Co (Todorov et al., 2012) continuous control tasks.
Researcher Affiliation Academia Yiqin Tan , Pihe Hu , Ling Pan, Jiatai Huang, Longbo Huang Institute for Interdisciplinary Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China {tyq22, hph19}@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn
Pseudocode Yes The pseudo-code of our scheme is given in Algorithm 1, where is the element-wise multiplication operator and Mθ is the binary mask to represent the sparse topology of the network θ. (...) Algorithm 1 Topology Evolution (Evci et al., 2020)
Open Source Code Yes The code is available at https://github.com/tyq1024/RLx2.
Open Datasets Yes Our experiments are conducted in four popular Mu Jo Co environments: Half Cheetah-v3 (Hal.), Hopper-v3 (Hop.), Walker2d-v3 (Wal.), and Ant-v3 (Ant.),3 for RLx2 with two off-policy algorithms, TD3 and SAC.
Dataset Splits No The paper describes training steps and evaluations based on average reward per episode, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) as commonly seen in supervised learning.
Hardware Specification Yes Our experiments are implemented with Py Torch (Paszke et al., 2017) and run on 8x P100 GPUs.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number for PyTorch or other key software dependencies.
Experiment Setup Yes Table 4 presents detailed hyperparameters of RLx2-TD3 and RLx2-SAC in our experiments.