Online Multi-Task Learning for Policy Gradient Methods

Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate PG-ELLA on four dynamical systems, including an application to quadrotor control, and show that PG-ELLA outperforms standard policy gradients both in the initial and final performance.
Researcher Affiliation Academia Haitham Bou Ammar HAITHAMB@SEAS.UPENN.EDU Eric Eaton EEATON@CIS.UPENN.EDU University of Pennsylvania, Computer and Information Science Department, Philadelphia, PA 19104 USA Paul Ruvolo PAUL.RUVOLO@OLIN.EDU Olin College of Engineering, Needham, MA 02492 USA Matthew E. Taylor TAYLORM@EECS.WSU.EDU Washington State University, School of Electrical Engineering and Computer Science, Pullman, WA 99164 USA
Pseudocode Yes Algorithm 1 PG-ELLA (k, λ, µ)
Open Source Code No The paper does not include any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets No The paper describes benchmark dynamical systems and how tasks were generated by varying parameters, but it does not provide concrete access information (link, DOI, citation with author/year, or mention of a standard public dataset with access details) for a publicly available or open dataset used for training. For example, it states: 'We first generated 30 tasks for each domain by varying the system parameters over the ranges given in Table 1.'
Dataset Splits No The paper mentions 'The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks.' However, it does not provide specific data splits (percentages or counts) for training, validation, or testing for the main experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No To configure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The paper mentions 'e NAC' but does not specify a version number for this or any other software dependencies.
Experiment Setup Yes At each learning session, PG-ELLA was limited to 50 trajectories (for SM & CP) or 20 trajectories (for 3CP) with 150 time steps each to perform the update. Learning ceased once PG-ELLA had experienced at least one session with each task. To configure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks. The stepsize for each task domain was determined by a line search after gathering 10 trajectories of length 150.