Transfer RL across Observation Feature Spaces via Model-Based Regularization
Authors: Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task. (...) Experiments in 7 environments show that our proposed algorithm significantly improves the learning performance of RL agents in the target task. |
| Researcher Affiliation | Collaboration | Yanchao Sun Ruijie Zheng Xiyao Wang Andrew Cohen Furong Huang University of Maryland, College Park Unity Technologies |
| Pseudocode | Yes | Algorithm 1 Source Task Learning (...) Algorithm 2 Target Task Learning with Transferred Dynamics Models |
| Open Source Code | Yes | The source code and running instructions are provided in the supplementary materials. |
| Open Datasets | Yes | We use 3 vector-input environments Cart Pole, Acrobot and Cheetah-Run as source tasks (...) We use 3 Mu Jo Co environments: Half Cheetah, Hopper and Walker2d (...) we use an existing game 3DBall contained in the Unity ML-Agents Toolkit (Juliani et al., 2018). |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It describes the environments and states that samples are drawn from a replay buffer, but no formal splits are detailed. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It mentions 'GPU' in the context of Deep RL but not as a specific hardware setup. |
| Software Dependencies | No | While the paper mentions algorithms like DQN, SAC, Adam optimizer, and ReLU activation, it does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or environments/toolkits (e.g., Unity ML-Agents Toolkit) that would be needed for replication. |
| Experiment Setup | Yes | Detailed experiment setup and hyperparameters are in Appendix E. (...) In Cart Pole, we use a replay buffer with size 10000. In the more challenging Acrobot, we use a prioritized replay buffer with size 100000. (...) The number of hidden units for all neural networks is 256. (...) The learning rate is 3e-4. (...) In Cart Pole, λ is set as 18; in 3DBall, λ is set as 10; in Acrobot, λ is set as 5; in the remaining Mu Jo Co environments where dynamics are more complicated, λ is set as 1. |