Sharing Knowledge in Multi-Task Deep Reinforcement Learning

Authors: Carlo D'Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks, extending the wellknown finite-time bounds of Approximate Value-Iteration to the multi-task setting. In addition, we complement our analysis by proposing multi-task extensions of three Reinforcement Learning algorithms that we empirically evaluate on widely used Reinforcement Learning benchmarks showing significant improvements over the single-task counterparts in terms of sample efficiency and performance.
Researcher Affiliation Academia Carlo D Eramo & Davide Tateo Department of Computer Science TU Darmstadt, IAS Hochschulstraße 10, 64289, Darmstadt, Germany {carlo.deramo,davide.tateo}@tu-darmstadt.de Andrea Bonarini & Marcello Restelli Politecnico di Milano, DEIB Piazza Leonardo da Vinci 32, 20133, Milano {andrea.bonarini,marcello.restelli}@polimi.it Jan Peters TU Darmstadt, IAS Hochschulstraße 10, 64289, Darmstadt, Germany Max Planck Institute for Intelligent Systems Max-Planck-Ring 4, 72076, Tübingen, Germany jan.peters@tu-darmstadt.de
Pseudocode No The paper describes algorithms and architectures textually but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Experiments have been developed using the Mushroom RL library (D Eramo et al., 2020).
Open Datasets Yes We consider the Car-On-Hill problem as described in Ernst et al. (2005)... The implementation of the first three problems is the one provided by the Open AI Gym library Brockman et al. (2016)... The two set of problems we consider for this experiment are: one including Inverted-Pendulum, Inverted-Double-Pendulum, and Inverted-Pendulum-Swingup, and another one including Hopper Stand, Walker-Walk, and Half-Cheetah-Run5. ... The IDs of the problems in the pybullet library are: Inverted Pendulum Bullet Env-v0, Inverted Double Pendulum Bullet Env-v0, and Inverted Pendulum Swingup Bullet Env-v0. The names of the domain and the task of the problems in the Deep Mind Control Suite are: hopper-stand, walker-walk, and cheetah-run.
Dataset Splits No The paper does not explicitly state training, validation, or test dataset splits using percentages or specific sample counts. It describes evaluation processes in terms of steps or epochs, but not data partitioning for reproduction.
Hardware Specification Yes Experiments have been developed using the Mushroom RL library (D Eramo et al., 2020), and run on an NVIDIA R DGX Station TM and Intel R AI Dev Cloud.
Software Dependencies No The paper mentions software like 'Mushroom RL library', 'Open AI Gym library', 'pybullet library', and 'Deep Mind Control Suite', but it does not provide specific version numbers for any of these components.
Experiment Setup Yes Running Adam optimizer with learning rate 0.001 and using a mean squared loss, we train a neural network composed of 2 shared layers of 30 neurons each, with sigmoidal activation function... The discount factors are respectively 0.99, 0.99, 0.99, 0.95, and 0.95. The horizons are respectively 500, 1, 000, 1, 000, 100, and 3, 000. The network we use consists of 80 Re Lu units for each wt, t {1, . . . , T} block, with T = 5. Then, the shared block h consists of one layer with 80 Re Lu units and another one with 80 sigmoid units... The initial replay memory size for each task is 100 and the maximum size is 5, 000. We use Huber loss with Adam optimizer using learning rate 10 3 and batch size of 100 samples for each task. The target network is updated every 100 steps. The exploration is ε-greedy with ε linearly decaying from 1 to 0.01 in the first 5, 000 steps.