RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
Authors: Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate the state-of-the-art sparse training performance of RLx2 with two popular DRL algorithms, TD3 (Fujimoto et al., 2018) and SAC (Haarnoja et al., 2018), on several Mu Jo Co (Todorov et al., 2012) continuous control tasks. |
| Researcher Affiliation | Academia | Yiqin Tan , Pihe Hu , Ling Pan, Jiatai Huang, Longbo Huang Institute for Interdisciplinary Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China {tyq22, hph19}@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn |
| Pseudocode | Yes | The pseudo-code of our scheme is given in Algorithm 1, where is the element-wise multiplication operator and Mθ is the binary mask to represent the sparse topology of the network θ. (...) Algorithm 1 Topology Evolution (Evci et al., 2020) |
| Open Source Code | Yes | The code is available at https://github.com/tyq1024/RLx2. |
| Open Datasets | Yes | Our experiments are conducted in four popular Mu Jo Co environments: Half Cheetah-v3 (Hal.), Hopper-v3 (Hop.), Walker2d-v3 (Wal.), and Ant-v3 (Ant.),3 for RLx2 with two off-policy algorithms, TD3 and SAC. |
| Dataset Splits | No | The paper describes training steps and evaluations based on average reward per episode, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) as commonly seen in supervised learning. |
| Hardware Specification | Yes | Our experiments are implemented with Py Torch (Paszke et al., 2017) and run on 8x P100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number for PyTorch or other key software dependencies. |
| Experiment Setup | Yes | Table 4 presents detailed hyperparameters of RLx2-TD3 and RLx2-SAC in our experiments. |