Distributed Multitask Reinforcement Learning with Quadratic Convergence

Authors: Rasul Tutunov, Dongho Kim, Haitham Bou Ammar

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyse the performance of our method both theoretically and empirically. On the theory side, we formally prove quadratic convergence. On the empirical side, we show that our new technique outperforms state-of-the-art methods from both distributed optimisation and lifelong reinforcement learning on a variety of graph topologies.
Researcher Affiliation Industry Rasul Tutunov PROWLER.io Cambridge, United Kingdom rasul@prowler.io Dongho Kim PROWLER.io Cambridge, United Kingdom dongho@prowler.io Haitham Bou-Ammar PROWLER.io Cambridge, United Kingdom haitham@prowler.io
Pseudocode No The paper describes its solution steps in text but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide an explicit statement or link for open-source code related to the methodology described.
Open Datasets Yes Our experiments ran on five systems, simple mass (SM), double mass (DM), cart-pole (CP), helicopter (HC), and humanoid robots (HR). We followed the experimental protocol in [10, 33] where we generated 5000 SM, 500 DM, and 1000 CP tasks by varying the dynamical parameters of each of the above systems.
Dataset Splits No The paper describes generating and distributing tasks but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) or reference standard predefined splits for these tasks.
Hardware Specification No The paper mentions using 'MATLAB s parallel pool running on 10 nodes' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies No The paper mentions the use of 'MATLAB' but does not provide specific version numbers for MATLAB or any other software dependencies required to replicate the experiments.
Experiment Setup Yes An ϵ = 1/100 was provided to the Chebyshev solver for determining the approximate Newton direction in all cases. Step-sizes were determined separately for each algorithm using a grid-search-like technique over {0.01, . . . , 1} to ensure best operating conditions.