Dynamic Sparse Training for Deep Reinforcement Learning
Authors: Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on continuous control tasks show that our dynamic sparse agents achieve higher performance than the equivalent dense methods, reduce the parameter count and floating-point operations (FLOPs) by 50%, and have a faster learning speed that enables reaching the performance of dense agents with 40 50% reduction in the training steps. |
| Researcher Affiliation | Collaboration | 1Eindhoven University of Technology, The Netherlands 2University of Twente, The Netherlands 3The University of Texas at Austin, Sony AI, United States |
| Pseudocode | Yes | Algorithm 1 DS-TD3 (λl, η, e, N, τ, d) ... Algorithm 2 Topological Adaptation ( X, M, η) ... Algorithm 3 Maintain Sparsity (X, k) |
| Open Source Code | Yes | Code is available at: https://github.com/GhadaSokar/Dynamic-Sparse-Training-for-Deep-Reinforcement-Learning. |
| Open Datasets | Yes | We performed our experiments on Mu Jo Co continuous control tasks, interfaced through Open AI Gym. We evaluate our proposed approach on five challenging environments (Half Cheetah-v3, Hopper-v3, Walker2d-v3, Antv3, and Humanoid-v3). |
| Dataset Splits | No | The paper uses standard continuous control tasks from MuJoCo environments, where data is generated through interaction. It does not explicitly define static training, validation, and test dataset splits with percentages or counts for these dynamic environments. |
| Hardware Specification | No | The paper mentions hardware in the context of prior work (e.g., "32 third-generation tensor processing units (TPUs)" used by Vinyals et al., 2019) but does not specify the hardware used for its own experiments. |
| Software Dependencies | No | The paper mentions using TD3, SAC, MuJoCo, and OpenAI Gym, but does not specify version numbers for these or any underlying programming languages or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Algorithm 1 DS-TD3 (λl, η, e, N, τ, d) ... We propose to delay the adaptation process and perform it every e time steps, where e is a hyperparameter. ... We analyze the effect of the adaptation schedule on the performance. ... Figure 2a shows the learning curves of DS-TD3 using different adaptation schedules controlled by the hyperparameter e (Section 3). Adapting the topology very frequently (e {200, 500}) ... With less frequent adaptation cycles, e = 1000 ... We analyze the performance of our proposed method using different sparsity levels. ... with a sparsity level of 51%. ... By removing 25% of the connections ... with a higher reduction in the size of the networks by 50%, ... when the network has a very high sparsity level (i.e., 80%). |