reproducibilityindex.ai

Scalable Multitask Policy Gradient Reinforcement Learning

Authors: Salam El Bsat, Haitham Bou Ammar, Matthew Taylor

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We justify the correctness of our method both theoretically and empirically: we ﬁrst proof an improvement of convergence speed to an order of O 1 k with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks. We empirically validate our method on ﬁve existing benchmark dynamical systems (Bou Ammar et al. 2015).
Researcher Affiliation	Academia	Salam El Bsat Raﬁk Hariri University Haitham Bou Ammar American University of Beirut Matthew E. Taylor Washington State University
Pseudocode	Yes	Algorithm 1 Scalable Multitask Policy Search
Open Source Code	No	The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We empirically validate our method on ﬁve existing benchmark dynamical systems (Bou Ammar et al. 2015). The cart pole (CP) system..., The double inverted pendulum (DIP)..., A linearized model of a helicopter (HC)..., The simple mass (SM) system..., The double mass (DM)...
Dataset Splits	No	The paper describes data generation ("We generated 150 tasks for each domain...") and how data is used per iteration ("...the learner observed a task through 50 trajectories of 150 steps and performed algorithmic updates."), but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 split, k-fold cross-validation).
Hardware Specification	No	To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This mentions the number of nodes but lacks specific details on the CPU, GPU models, or memory of these nodes.
Software Dependencies	No	To distribute our computations, we made use of MATLAB s parallel pool running on 10 nodes. This specifies software by name but lacks version numbers for MATLAB or any other libraries.
Experiment Setup	No	The paper specifies the number of iterations (200), trajectories (50), and steps (150). It also mentions the regularization terms with μ1 and μ2, but does not provide specific numerical values for these hyperparameters or other training configurations like learning rates, optimizers, or model initialization details.