reproducibilityindex.ai

Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning

Authors: Haitham Bou Ammar, Eric Eaton, Jose Marcio Luna, Paul Ruvolo

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the ability of our approach to learn optimal control policies for multiple consecutive tasks from six different dynamical systems... and Figure 2 shows the average learning performance on individual domains after this process of interleaved lifelong learning, depicting domains in which cross-domain transfer shows clear advantages over PG-ELLA and PG (e.g., DCP, HC), and an example domain where cross-domain transfer is less effective (CP).
Researcher Affiliation	Academia	Haitham Bou Ammar Univ. of Pennsylvania haithamb@seas.upenn.edu Eric Eaton Univ. of Pennsylvania eeaton@cis.upenn.edu Jos e Marcio Luna Univ. of Pennsylvania joseluna@seas.upenn.edu Paul Ruvolo Olin College of Engineering paul.ruvolo@olin.edu
Pseudocode	No	The paper describes the algorithm steps and equations, but does not include a formally structured pseudocode or algorithm block.
Open Source Code	Yes	The complete implementation of our approach is available on the authors websites.
Open Datasets	No	For each of these systems, we created three different tasks by varying the system parameters to create systems with different dynamics, yielding 18 tasks total. The paper does not provide concrete access information for these created tasks/datasets.
Dataset Splits	No	The paper describes training with '100 sampled trajectories of length 50' and interleaved training rounds, but does not specify explicit train/validation/test dataset splits or percentages for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions 'Natural Actor Critic' as the base PG learner but does not provide specific version numbers for any software components or libraries.
Experiment Setup	Yes	All regularization parameters (the µ s) were set to e 5, and the learning rates and latent dimensions were set via cross-validation over a few tasks.