Toward Robust Long Range Policy Transfer
Authors: Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun9958-9966
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments In this section, we introduce the evaluation tasks in Sec. 5.1 and list the baselines that we intend to compare in Sec. 5.2. The results of the methods evaluated in our environments are discussed in Sec. 5.3. Aside from option-critic (Bacon, Harb, and Precup 2017), all the experiments are trained with PPO (Schulman et al. 2017) and Generalized Advantage Estimation (GAE) (Schulman et al. 2016). The detailed hyperparameter settings are shown in supplementary Sec. 4.1. We further show that our method has a broader transferring range compared with other baselines in Sec. 5.4. Then, we demonstrate the effectiveness of the regularization terms in Sec. 5.5. Finally, we demonstrate that our method can perform well even if the primitive policies are not good enough in Sec. 5.6. |
| Researcher Affiliation | Collaboration | Wei-Cheng Tseng1, Jin-Siang Lin1, Yao-Min Feng1, Min Sun1,2,3 1National Tsing Hua University 2Appier Inc., Taiwan 3MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan |
| Pseudocode | Yes | Algorithm 1: Full Algorithm |
| Open Source Code | Yes | The source code is available to the public3. 3https://weichengtseng.github.io/project_website/aaai21 |
| Open Datasets | No | All the tasks are built with Py Bullet 5. The implementation detail of these tasks will be described in supplementary Sec. 4.2. There is no specific public dataset named or linked. |
| Dataset Splits | No | No explicit statement of training, validation, or test dataset splits was found in the main text. Details on hyperparameter settings are deferred to supplementary material. |
| Hardware Specification | No | No specific hardware (GPU, CPU models, cloud instances, etc.) used for running experiments was mentioned. |
| Software Dependencies | No | All the experiments are trained with PPO (Schulman et al. 2017) and Generalized Advantage Estimation (GAE) (Schulman et al. 2016). All the tasks are built with Py Bullet 5. No specific version numbers for these software components are provided. |
| Experiment Setup | No | The detailed hyperparameter settings are shown in supplementary Sec. 4.1. This statement indicates that experimental setup details are provided, but not directly in the main body of the paper. |