reproducibilityindex.ai

Online Multi-Task Learning for Policy Gradient Methods

Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PG-ELLA on four dynamical systems, including an application to quadrotor control, and show that PG-ELLA outperforms standard policy gradients both in the initial and ﬁnal performance.
Researcher Affiliation	Academia	Haitham Bou Ammar HAITHAMB@SEAS.UPENN.EDU Eric Eaton EEATON@CIS.UPENN.EDU University of Pennsylvania, Computer and Information Science Department, Philadelphia, PA 19104 USA Paul Ruvolo PAUL.RUVOLO@OLIN.EDU Olin College of Engineering, Needham, MA 02492 USA Matthew E. Taylor TAYLORM@EECS.WSU.EDU Washington State University, School of Electrical Engineering and Computer Science, Pullman, WA 99164 USA
Pseudocode	Yes	Algorithm 1 PG-ELLA (k, λ, µ)
Open Source Code	No	The paper does not include any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets	No	The paper describes benchmark dynamical systems and how tasks were generated by varying parameters, but it does not provide concrete access information (link, DOI, citation with author/year, or mention of a standard public dataset with access details) for a publicly available or open dataset used for training. For example, it states: 'We first generated 30 tasks for each domain by varying the system parameters over the ranges given in Table 1.'
Dataset Splits	No	The paper mentions 'The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks.' However, it does not provide specific data splits (percentages or counts) for training, validation, or testing for the main experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	To conﬁgure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The paper mentions 'e NAC' but does not specify a version number for this or any other software dependencies.
Experiment Setup	Yes	At each learning session, PG-ELLA was limited to 50 trajectories (for SM & CP) or 20 trajectories (for 3CP) with 150 time steps each to perform the update. Learning ceased once PG-ELLA had experienced at least one session with each task. To conﬁgure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks. The stepsize for each task domain was determined by a line search after gathering 10 trajectories of length 150.