Efficient Reinforcement Learning by Discovering Neural Pathways
Authors: Samin Yeasar Arnob, Riyasat Ohib, Sergey Plis, Amy Zhang, Alessandro Sordoni, Doina Precup
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that even very small learned sub-networks, using less than 5% of the large network s parameters, can provide very good quality solutions. We also demonstrate the training of multiple pathways within the same networks in a multi-task setup, where each pathway tackles a separate task. We evaluate empirically our approach on several continuous control tasks, in both online and offline settings. |
| Researcher Affiliation | Collaboration | Samin Yeasar Arnob Department of Computer Science Mc Gill University Mila Quebec AI Institute samin.arnob@mail.mcgill.ca Riyasat Ohib Georgia Institute of Technology Sergey Plis Georgia State University Amy Zhang University of Texas, Austin Alessandro Sordoni Microsoft Research Doina Precup Mc Gill University Mila Quebec AI Institute |
| Pseudocode | Yes | Algorithm 1 Multi-task SAC with DAPD (SAC-DAPD) |
| Open Source Code | Yes | We provide code to reproduce the result of our proposed method DAPD for single-task online RL. |
| Open Datasets | Yes | We conduct our experiment using Soft-Actor-Critic (SAC) [29] on Mu Jo Co continuous control tasks. ... We compare the performance of our proposed method against various baselines on the Meta World [98] MT10 benchmark. |
| Dataset Splits | No | The paper describes data collection and training processes but does not explicitly provide fixed, predefined training/test/validation dataset splits with percentages or sample counts in the manner of a static dataset partition. |
| Hardware Specification | No | We conducted experiments using various GPUs available at the time, which may have influenced the runtime; therefore, we did not provide specific details. The absence of specific details does not compromise the reproducibility of our experiments. |
| Software Dependencies | Yes | We run our algorithm in Py Torch-1.9.0 [58] and use following libraries: Soft-Actor-Critic (SAC) [95], Implicit Q-learning (IQL) [86], Single-shot pruning (SNIP) [1], official BCQ [22], Ri GL and Rlx2 [82] implementation. |
| Experiment Setup | Yes | In table 4 we present the network hyper-parameters of different algorithms that are used in this work. In online Meta World experiments, we deviate from the standard procedure and instead adopt the SAC multitask hyperparameters suggested in the benchmark [77] for fair comparison. Specifically, we employ neural networks with three layers, each containing 400 hidden units, and utilize a mini-batch size of 128. |