Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning
Authors: Haitham Bou Ammar, Eric Eaton, Jose Marcio Luna, Paul Ruvolo
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the ability of our approach to learn optimal control policies for multiple consecutive tasks from six different dynamical systems... and Figure 2 shows the average learning performance on individual domains after this process of interleaved lifelong learning, depicting domains in which cross-domain transfer shows clear advantages over PG-ELLA and PG (e.g., DCP, HC), and an example domain where cross-domain transfer is less effective (CP). |
| Researcher Affiliation | Academia | Haitham Bou Ammar Univ. of Pennsylvania haithamb@seas.upenn.edu Eric Eaton Univ. of Pennsylvania eeaton@cis.upenn.edu Jos e Marcio Luna Univ. of Pennsylvania joseluna@seas.upenn.edu Paul Ruvolo Olin College of Engineering paul.ruvolo@olin.edu |
| Pseudocode | No | The paper describes the algorithm steps and equations, but does not include a formally structured pseudocode or algorithm block. |
| Open Source Code | Yes | The complete implementation of our approach is available on the authors websites. |
| Open Datasets | No | For each of these systems, we created three different tasks by varying the system parameters to create systems with different dynamics, yielding 18 tasks total. The paper does not provide concrete access information for these created tasks/datasets. |
| Dataset Splits | No | The paper describes training with '100 sampled trajectories of length 50' and interleaved training rounds, but does not specify explicit train/validation/test dataset splits or percentages for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Natural Actor Critic' as the base PG learner but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | All regularization parameters (the ยต s) were set to e 5, and the learning rates and latent dimensions were set via cross-validation over a few tasks. |