Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment
Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor
AAAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL. |
| Researcher Affiliation | Academia | Haitham Bou Ammar Univ. of Pennsylvania EMAIL Eric Eaton Univ. of Pennsylvania EMAIL Paul Ruvolo Olin College of Engineering EMAIL Matthew E. Taylor Washington State Univ. EMAIL |
| Pseudocode | Yes | Algorithm 1 Manifold Alignment Cross-Domain Transfer for Policy Gradients (MAXDT-PG) |
| Open Source Code | No | The paper does not explicitly state that the source code for the described methodology is publicly available or provide a direct link to a code repository. |
| Open Datasets | No | The paper describes experiments conducted on simulated dynamical systems (Simple Mass Spring Damper, Cart Pole, Three-Link Cart Pole, Quadrotor) by generating traces and samples, rather than using a pre-existing, publicly available dataset for which access information is provided. |
| Dataset Splits | No | The paper describes generating 'traces' and 'samples' from simulated dynamical systems, and evaluating performance based on learning iterations, but does not specify explicit train/validation/test dataset splits with percentages, counts, or predefined partition methodologies. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Figure 3 shows MAXDT-PG s performance using varying numbers of source and target samples to learn ΟS. These results reveal that transfer-initialized policies outperform standard policy gradient initialization. Further, as the number of samples used to learn ΟS increases, so does both the initial and ο¬nal performance in all domains. All initializations result in equal per-iteration computational cost. Therefore, MAXDT-PG both improves sample complexity and reduces wall-clock learning time. [...] Rewards were averaged over 500 traces collected from 150 initial states. |