Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scalable Multitask Policy Gradient Reinforcement Learning
Authors: Salam El Bsat, Haitham Bou Ammar, Matthew Taylor
AAAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We justify the correctness of our method both theoretically and empirically: we first proof an improvement of convergence speed to an order of O 1 k with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks. We empirically validate our method on five existing benchmark dynamical systems (Bou Ammar et al. 2015). |
| Researcher Affiliation | Academia | Salam El Bsat Rafik Hariri University Haitham Bou Ammar American University of Beirut Matthew E. Taylor Washington State University |
| Pseudocode | Yes | Algorithm 1 Scalable Multitask Policy Search |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We empirically validate our method on five existing benchmark dynamical systems (Bou Ammar et al. 2015). The cart pole (CP) system..., The double inverted pendulum (DIP)..., A linearized model of a helicopter (HC)..., The simple mass (SM) system..., The double mass (DM)... |
| Dataset Splits | No | The paper describes data generation ("We generated 150 tasks for each domain...") and how data is used per iteration ("...the learner observed a task through 50 trajectories of 150 steps and performed algorithmic updates."), but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 split, k-fold cross-validation). |
| Hardware Specification | No | To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This mentions the number of nodes but lacks specific details on the CPU, GPU models, or memory of these nodes. |
| Software Dependencies | No | To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This specifies software by name but lacks version numbers for MATLAB or any other libraries. |
| Experiment Setup | No | The paper specifies the number of iterations (200), trajectories (50), and steps (150). It also mentions the regularization terms with μ1 and μ2, but does not provide specific numerical values for these hyperparameters or other training configurations like learning rates, optimizers, or model initialization details. |