Visual Transfer For Reinforcement Learning Via Wasserstein Domain Confusion
Authors: Josh Roy, George D. Konidaris9454-9462
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental Results We validate our novel Wasserstein Confusion loss term and WAPPO algorithm on 17 environments: Visual Cartpole and both the easy and hard versions of 16 Open AI Procgen environments. |
| Researcher Affiliation | Academia | Josh Roy and George Konidaris Brown University joshnroy@gmail.com, gdk@cs.brown.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | For further details, please see the appendix and the code 1 distributed with Cobbe et al. (2019a). (footnote 1 points to https://github.com/openai/procgen). This points to the code of the baseline (PPO), not explicitly the specific WAPPO implementation or modifications made by the authors for their method. |
| Open Datasets | Yes | We validate our novel Wasserstein Confusion loss term and WAPPO algorithm on 17 environments: Visual Cartpole and both the easy and hard versions of 16 Open AI Procgen environments. |
| Dataset Splits | No | The paper states: 'For each environment evaluated, the agent trains using WAPPO with full access to the source domain and a buffer of 5000 observations from the target domain.' However, it does not specify explicit training/validation/test dataset splits or cross-validation settings typically associated with model validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'PPO implementation' and 'Leaky Re LU activations' but does not provide specific version numbers for software dependencies like Python, deep learning frameworks, or libraries. |
| Experiment Setup | Yes | We utilize the PPO implementation and hyperparameters provided with (Cobbe et al. 2019a). We use these same hyperparameters for the other methods tested and do not perform any hyperparameter searches. |