Efficient Reinforcement Learning by Discovering Neural Pathways

Authors: Samin Yeasar Arnob, Riyasat Ohib, Sergey Plis, Amy Zhang, Alessandro Sordoni, Doina Precup

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that even very small learned sub-networks, using less than 5% of the large network s parameters, can provide very good quality solutions. We also demonstrate the training of multiple pathways within the same networks in a multi-task setup, where each pathway tackles a separate task. We evaluate empirically our approach on several continuous control tasks, in both online and offline settings.
Researcher Affiliation Collaboration Samin Yeasar Arnob Department of Computer Science Mc Gill University Mila Quebec AI Institute samin.arnob@mail.mcgill.ca Riyasat Ohib Georgia Institute of Technology Sergey Plis Georgia State University Amy Zhang University of Texas, Austin Alessandro Sordoni Microsoft Research Doina Precup Mc Gill University Mila Quebec AI Institute
Pseudocode Yes Algorithm 1 Multi-task SAC with DAPD (SAC-DAPD)
Open Source Code Yes We provide code to reproduce the result of our proposed method DAPD for single-task online RL.
Open Datasets Yes We conduct our experiment using Soft-Actor-Critic (SAC) [29] on Mu Jo Co continuous control tasks. ... We compare the performance of our proposed method against various baselines on the Meta World [98] MT10 benchmark.
Dataset Splits No The paper describes data collection and training processes but does not explicitly provide fixed, predefined training/test/validation dataset splits with percentages or sample counts in the manner of a static dataset partition.
Hardware Specification No We conducted experiments using various GPUs available at the time, which may have influenced the runtime; therefore, we did not provide specific details. The absence of specific details does not compromise the reproducibility of our experiments.
Software Dependencies Yes We run our algorithm in Py Torch-1.9.0 [58] and use following libraries: Soft-Actor-Critic (SAC) [95], Implicit Q-learning (IQL) [86], Single-shot pruning (SNIP) [1], official BCQ [22], Ri GL and Rlx2 [82] implementation.
Experiment Setup Yes In table 4 we present the network hyper-parameters of different algorithms that are used in this work. In online Meta World experiments, we deviate from the standard procedure and instead adopt the SAC multitask hyperparameters suggested in the benchmark [77] for fair comparison. Specifically, we employ neural networks with three layers, each containing 400 hidden units, and utilize a mini-batch size of 128.