The State of Sparse Training in Deep Reinforcement Learning
Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain sparse networks perform better than dense networks for the same parameter count in the DRL domain. |
| Researcher Affiliation | Industry | Laura Graesser * 1 2 Utku Evci * 2 Erich Elsen 3 Pablo Samuel Castro 2 1Robotics at Google 2Google Research, Canada 3Adept. |
| Pseudocode | No | The paper describes various algorithms (e.g., DQN, PPO, SAC, Pruning, SET, Rig L) in text, but it does not include any formal pseudocode blocks or figures explicitly labeled 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | 1Code for reproducing our results can be found at github.com/google-research/rigl/tree/master/rigl/rl |
| Open Datasets | Yes | For discrete-control we focus on three classic control environments (Cart Pole, Acrobot, and Mountain Car) as well as 15 games from the ALE Atari suite (Bellemare et al., 2013) (see subsection A.4 for game selection details). For continuous-control we use five environments of varying difficulty from the Mu Jo Co suite (Todorov et al., 2012) (Half Cheetah, Hopper, Walker2d, Ant, and Humanoid). |
| Dataset Splits | No | The paper specifies training steps and evaluation intervals ('During training all agents are allowed M environment transitions, with policies being evaluated for K episodes / steps every N environment frames'), and reports average rewards over the last 10% of evaluations. However, it does not explicitly describe distinct training, validation, and test *dataset splits* with percentages or sample counts for data partitioning in the conventional supervised learning sense. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or TPU versions. |
| Software Dependencies | No | Our code is built upon the TF-Agents (Guadarrama et al., 2018), Dopamine (Castro et al., 2018), and Rig L (Evci et al., 2020) codebases. We use rliable (Agarwal et al., 2021) to calculate the interquartile mean (IQM) and plot the results. The paper names software libraries and frameworks used but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We perform a grid search over different hyper parameters used in Dense, Prune, Static, SET and Rig L algorithms. Unless otherwise noted, we use hyper-parameters used in regular dense training. ... We search over the following parameters: 1. Weight decay ( ): Searched over the grid [0, 1e-6, 1e-4, 1e-3]. 2. Update Interval ( ): refers to how often models are pruned or sparse topology is updated. Searched over the grid [100, 250, 500, 1000, 5000]. 3. Drop Fraction ( ): refers to the maximum percentage of parameters that are dropped and added when network topology is updated. ... 4. Sparsity-aware initialization ( ). |