reproducibilityindex.ai

Toward Robust Long Range Policy Transfer

Authors: Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun9958-9966

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments In this section, we introduce the evaluation tasks in Sec. 5.1 and list the baselines that we intend to compare in Sec. 5.2. The results of the methods evaluated in our environments are discussed in Sec. 5.3. Aside from option-critic (Bacon, Harb, and Precup 2017), all the experiments are trained with PPO (Schulman et al. 2017) and Generalized Advantage Estimation (GAE) (Schulman et al. 2016). The detailed hyperparameter settings are shown in supplementary Sec. 4.1. We further show that our method has a broader transferring range compared with other baselines in Sec. 5.4. Then, we demonstrate the effectiveness of the regularization terms in Sec. 5.5. Finally, we demonstrate that our method can perform well even if the primitive policies are not good enough in Sec. 5.6.
Researcher Affiliation	Collaboration	Wei-Cheng Tseng1, Jin-Siang Lin1, Yao-Min Feng1, Min Sun1,2,3 1National Tsing Hua University 2Appier Inc., Taiwan 3MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan
Pseudocode	Yes	Algorithm 1: Full Algorithm
Open Source Code	Yes	The source code is available to the public3. 3https://weichengtseng.github.io/project_website/aaai21
Open Datasets	No	All the tasks are built with Py Bullet 5. The implementation detail of these tasks will be described in supplementary Sec. 4.2. There is no specific public dataset named or linked.
Dataset Splits	No	No explicit statement of training, validation, or test dataset splits was found in the main text. Details on hyperparameter settings are deferred to supplementary material.
Hardware Specification	No	No specific hardware (GPU, CPU models, cloud instances, etc.) used for running experiments was mentioned.
Software Dependencies	No	All the experiments are trained with PPO (Schulman et al. 2017) and Generalized Advantage Estimation (GAE) (Schulman et al. 2016). All the tasks are built with Py Bullet 5. No specific version numbers for these software components are provided.
Experiment Setup	No	The detailed hyperparameter settings are shown in supplementary Sec. 4.1. This statement indicates that experimental setup details are provided, but not directly in the main body of the paper.