Scalable Multitask Policy Gradient Reinforcement Learning
Authors: Salam El Bsat, Haitham Bou Ammar, Matthew Taylor
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We justify the correctness of our method both theoretically and empirically: we first proof an improvement of convergence speed to an order of O 1 k with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks. We empirically validate our method on five existing benchmark dynamical systems (Bou Ammar et al. 2015). |
| Researcher Affiliation | Academia | Salam El Bsat Rafik Hariri University Haitham Bou Ammar American University of Beirut Matthew E. Taylor Washington State University |
| Pseudocode | Yes | Algorithm 1 Scalable Multitask Policy Search |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We empirically validate our method on five existing benchmark dynamical systems (Bou Ammar et al. 2015). The cart pole (CP) system..., The double inverted pendulum (DIP)..., A linearized model of a helicopter (HC)..., The simple mass (SM) system..., The double mass (DM)... |
| Dataset Splits | No | The paper describes data generation ("We generated 150 tasks for each domain...") and how data is used per iteration ("...the learner observed a task through 50 trajectories of 150 steps and performed algorithmic updates."), but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 split, k-fold cross-validation). |
| Hardware Specification | No | To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This mentions the number of nodes but lacks specific details on the CPU, GPU models, or memory of these nodes. |
| Software Dependencies | No | To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This specifies software by name but lacks version numbers for MATLAB or any other libraries. |
| Experiment Setup | No | The paper specifies the number of iterations (200), trajectories (50), and steps (150). It also mentions the regularization terms with μ1 and μ2, but does not provide specific numerical values for these hyperparameters or other training configurations like learning rates, optimizers, or model initialization details. |