Meta Reinforcement Learning with Task Embedding and Shared Policy
Authors: Lin Lan, Zhenguo Li, Xiaohong Guan, Pinghui Wang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results1 on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines. |
| Researcher Affiliation | Collaboration | Lin Lan1 , Zhenguo Li2 , Xiaohong Guan1,3,4 and Pinghui Wang3,1 1MOE NSKEY Lab, Xi an Jiaotong University, China 2Huawei Noah s Ark Lab 3Shenzhen Research School, Xi an Jiaotong University, China 4Department of Automation and NLIST Lab, Tsinghua University, China llan@sei.xjtu.edu.cn, li.zhenguo@huawei.com, {xhguan, phwang}@mail.xjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training Procedure of TESP |
| Open Source Code | Yes | 1Code available at https://github.com/llan-ml/tesp. |
| Open Datasets | No | The paper states tasks are sampled within the MuJoCo simulator, and refers to generated tasks ("we sample 100 target locations... as training tasks D"). It does not provide access information for a pre-existing public dataset. |
| Dataset Splits | No | The paper discusses training and testing on different sets of sampled tasks (D, D', D'') and performing evaluations, but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing as one would for a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments, only mentioning the use of the MuJoCo simulator. |
| Software Dependencies | No | The paper mentions software components like MuJoCo simulator, VPG, and PPO, but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper mentions general experimental settings such as setting K to 3 and using VPG for fast-update and PPO for meta-update, but it explicitly defers 'detailed settings of environments and experiments' to a supplementary resource at a provided GitHub link, rather than including them in the main text. |