The State of Sparse Training in Deep Reinforcement Learning

Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain sparse networks perform better than dense networks for the same parameter count in the DRL domain.
Researcher Affiliation Industry Laura Graesser * 1 2 Utku Evci * 2 Erich Elsen 3 Pablo Samuel Castro 2 1Robotics at Google 2Google Research, Canada 3Adept.
Pseudocode No The paper describes various algorithms (e.g., DQN, PPO, SAC, Pruning, SET, Rig L) in text, but it does not include any formal pseudocode blocks or figures explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes 1Code for reproducing our results can be found at github.com/google-research/rigl/tree/master/rigl/rl
Open Datasets Yes For discrete-control we focus on three classic control environments (Cart Pole, Acrobot, and Mountain Car) as well as 15 games from the ALE Atari suite (Bellemare et al., 2013) (see subsection A.4 for game selection details). For continuous-control we use five environments of varying difficulty from the Mu Jo Co suite (Todorov et al., 2012) (Half Cheetah, Hopper, Walker2d, Ant, and Humanoid).
Dataset Splits No The paper specifies training steps and evaluation intervals ('During training all agents are allowed M environment transitions, with policies being evaluated for K episodes / steps every N environment frames'), and reports average rewards over the last 10% of evaluations. However, it does not explicitly describe distinct training, validation, and test *dataset splits* with percentages or sample counts for data partitioning in the conventional supervised learning sense.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or TPU versions.
Software Dependencies No Our code is built upon the TF-Agents (Guadarrama et al., 2018), Dopamine (Castro et al., 2018), and Rig L (Evci et al., 2020) codebases. We use rliable (Agarwal et al., 2021) to calculate the interquartile mean (IQM) and plot the results. The paper names software libraries and frameworks used but does not provide specific version numbers for them.
Experiment Setup Yes We perform a grid search over different hyper parameters used in Dense, Prune, Static, SET and Rig L algorithms. Unless otherwise noted, we use hyper-parameters used in regular dense training. ... We search over the following parameters: 1. Weight decay ( ): Searched over the grid [0, 1e-6, 1e-4, 1e-3]. 2. Update Interval ( ): refers to how often models are pruned or sparse topology is updated. Searched over the grid [100, 250, 500, 1000, 5000]. 3. Drop Fraction ( ): refers to the maximum percentage of parameters that are dropped and added when network topology is updated. ... 4. Sparsity-aware initialization ( ).