reproducibilityindex.ai

Transfer RL across Observation Feature Spaces via Model-Based Regularization

Authors: Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task. (...) Experiments in 7 environments show that our proposed algorithm significantly improves the learning performance of RL agents in the target task.
Researcher Affiliation	Collaboration	Yanchao Sun Ruijie Zheng Xiyao Wang Andrew Cohen Furong Huang University of Maryland, College Park Unity Technologies
Pseudocode	Yes	Algorithm 1 Source Task Learning (...) Algorithm 2 Target Task Learning with Transferred Dynamics Models
Open Source Code	Yes	The source code and running instructions are provided in the supplementary materials.
Open Datasets	Yes	We use 3 vector-input environments Cart Pole, Acrobot and Cheetah-Run as source tasks (...) We use 3 Mu Jo Co environments: Half Cheetah, Hopper and Walker2d (...) we use an existing game 3DBall contained in the Unity ML-Agents Toolkit (Juliani et al., 2018).
Dataset Splits	No	The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It describes the environments and states that samples are drawn from a replay buffer, but no formal splits are detailed.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It mentions 'GPU' in the context of Deep RL but not as a specific hardware setup.
Software Dependencies	No	While the paper mentions algorithms like DQN, SAC, Adam optimizer, and ReLU activation, it does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or environments/toolkits (e.g., Unity ML-Agents Toolkit) that would be needed for replication.
Experiment Setup	Yes	Detailed experiment setup and hyperparameters are in Appendix E. (...) In Cart Pole, we use a replay buffer with size 10000. In the more challenging Acrobot, we use a prioritized replay buffer with size 100000. (...) The number of hidden units for all neural networks is 256. (...) The learning rate is 3e-4. (...) In Cart Pole, λ is set as 18; in 3DBall, λ is set as 10; in Acrobot, λ is set as 5; in the remaining Mu Jo Co environments where dynamics are more complicated, λ is set as 1.