Scalable Multitask Policy Gradient Reinforcement Learning

Authors: Salam El Bsat, Haitham Bou Ammar, Matthew Taylor

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We justify the correctness of our method both theoretically and empirically: we first proof an improvement of convergence speed to an order of O 1 k with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks. We empirically validate our method on five existing benchmark dynamical systems (Bou Ammar et al. 2015).
Researcher Affiliation Academia Salam El Bsat Rafik Hariri University Haitham Bou Ammar American University of Beirut Matthew E. Taylor Washington State University
Pseudocode Yes Algorithm 1 Scalable Multitask Policy Search
Open Source Code No The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets Yes We empirically validate our method on five existing benchmark dynamical systems (Bou Ammar et al. 2015). The cart pole (CP) system..., The double inverted pendulum (DIP)..., A linearized model of a helicopter (HC)..., The simple mass (SM) system..., The double mass (DM)...
Dataset Splits No The paper describes data generation ("We generated 150 tasks for each domain...") and how data is used per iteration ("...the learner observed a task through 50 trajectories of 150 steps and performed algorithmic updates."), but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 split, k-fold cross-validation).
Hardware Specification No To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This mentions the number of nodes but lacks specific details on the CPU, GPU models, or memory of these nodes.
Software Dependencies No To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This specifies software by name but lacks version numbers for MATLAB or any other libraries.
Experiment Setup No The paper specifies the number of iterations (200), trajectories (50), and steps (150). It also mentions the regularization terms with μ1 and μ2, but does not provide specific numerical values for these hyperparameters or other training configurations like learning rates, optimizers, or model initialization details.