Dynamic Sparse Training for Deep Reinforcement Learning

Authors: Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on continuous control tasks show that our dynamic sparse agents achieve higher performance than the equivalent dense methods, reduce the parameter count and floating-point operations (FLOPs) by 50%, and have a faster learning speed that enables reaching the performance of dense agents with 40 50% reduction in the training steps.
Researcher Affiliation Collaboration 1Eindhoven University of Technology, The Netherlands 2University of Twente, The Netherlands 3The University of Texas at Austin, Sony AI, United States
Pseudocode Yes Algorithm 1 DS-TD3 (λl, η, e, N, τ, d) ... Algorithm 2 Topological Adaptation ( X, M, η) ... Algorithm 3 Maintain Sparsity (X, k)
Open Source Code Yes Code is available at: https://github.com/GhadaSokar/Dynamic-Sparse-Training-for-Deep-Reinforcement-Learning.
Open Datasets Yes We performed our experiments on Mu Jo Co continuous control tasks, interfaced through Open AI Gym. We evaluate our proposed approach on five challenging environments (Half Cheetah-v3, Hopper-v3, Walker2d-v3, Antv3, and Humanoid-v3).
Dataset Splits No The paper uses standard continuous control tasks from MuJoCo environments, where data is generated through interaction. It does not explicitly define static training, validation, and test dataset splits with percentages or counts for these dynamic environments.
Hardware Specification No The paper mentions hardware in the context of prior work (e.g., "32 third-generation tensor processing units (TPUs)" used by Vinyals et al., 2019) but does not specify the hardware used for its own experiments.
Software Dependencies No The paper mentions using TD3, SAC, MuJoCo, and OpenAI Gym, but does not specify version numbers for these or any underlying programming languages or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Algorithm 1 DS-TD3 (λl, η, e, N, τ, d) ... We propose to delay the adaptation process and perform it every e time steps, where e is a hyperparameter. ... We analyze the effect of the adaptation schedule on the performance. ... Figure 2a shows the learning curves of DS-TD3 using different adaptation schedules controlled by the hyperparameter e (Section 3). Adapting the topology very frequently (e {200, 500}) ... With less frequent adaptation cycles, e = 1000 ... We analyze the performance of our proposed method using different sparsity levels. ... with a sparsity level of 51%. ... By removing 25% of the connections ... with a higher reduction in the size of the networks by 50%, ... when the network has a very high sparsity level (i.e., 80%).