Online Multi-Task Learning for Policy Gradient Methods
Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PG-ELLA on four dynamical systems, including an application to quadrotor control, and show that PG-ELLA outperforms standard policy gradients both in the initial and final performance. |
| Researcher Affiliation | Academia | Haitham Bou Ammar HAITHAMB@SEAS.UPENN.EDU Eric Eaton EEATON@CIS.UPENN.EDU University of Pennsylvania, Computer and Information Science Department, Philadelphia, PA 19104 USA Paul Ruvolo PAUL.RUVOLO@OLIN.EDU Olin College of Engineering, Needham, MA 02492 USA Matthew E. Taylor TAYLORM@EECS.WSU.EDU Washington State University, School of Electrical Engineering and Computer Science, Pullman, WA 99164 USA |
| Pseudocode | Yes | Algorithm 1 PG-ELLA (k, λ, µ) |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for their methodology is open-source or publicly available. |
| Open Datasets | No | The paper describes benchmark dynamical systems and how tasks were generated by varying parameters, but it does not provide concrete access information (link, DOI, citation with author/year, or mention of a standard public dataset with access details) for a publicly available or open dataset used for training. For example, it states: 'We first generated 30 tasks for each domain by varying the system parameters over the ranges given in Table 1.' |
| Dataset Splits | No | The paper mentions 'The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks.' However, it does not provide specific data splits (percentages or counts) for training, validation, or testing for the main experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | To configure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The paper mentions 'e NAC' but does not specify a version number for this or any other software dependencies. |
| Experiment Setup | Yes | At each learning session, PG-ELLA was limited to 50 trajectories (for SM & CP) or 20 trajectories (for 3CP) with 150 time steps each to perform the update. Learning ceased once PG-ELLA had experienced at least one session with each task. To configure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks. The stepsize for each task domain was determined by a line search after gathering 10 trajectories of length 150. |