reproducibilityindex.ai

Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment

Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL.
Researcher Affiliation	Academia	Haitham Bou Ammar Univ. of Pennsylvania haithamb@seas.upenn.edu Eric Eaton Univ. of Pennsylvania eeaton@cis.upenn.edu Paul Ruvolo Olin College of Engineering paul.ruvolo@olin.edu Matthew E. Taylor Washington State Univ. taylorm@eecs.wsu.edu
Pseudocode	Yes	Algorithm 1 Manifold Alignment Cross-Domain Transfer for Policy Gradients (MAXDT-PG)
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is publicly available or provide a direct link to a code repository.
Open Datasets	No	The paper describes experiments conducted on simulated dynamical systems (Simple Mass Spring Damper, Cart Pole, Three-Link Cart Pole, Quadrotor) by generating traces and samples, rather than using a pre-existing, publicly available dataset for which access information is provided.
Dataset Splits	No	The paper describes generating 'traces' and 'samples' from simulated dynamical systems, and evaluating performance based on learning iterations, but does not specify explicit train/validation/test dataset splits with percentages, counts, or predefined partition methodologies.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	Figure 3 shows MAXDT-PG s performance using varying numbers of source and target samples to learn χS. These results reveal that transfer-initialized policies outperform standard policy gradient initialization. Further, as the number of samples used to learn χS increases, so does both the initial and ﬁnal performance in all domains. All initializations result in equal per-iteration computational cost. Therefore, MAXDT-PG both improves sample complexity and reduces wall-clock learning time. [...] Rewards were averaged over 500 traces collected from 150 initial states.