Meta Reinforcement Learning with Task Embedding and Shared Policy

Authors: Lin Lan, Zhenguo Li, Xiaohong Guan, Pinghui Wang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results1 on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.
Researcher Affiliation Collaboration Lin Lan1 , Zhenguo Li2 , Xiaohong Guan1,3,4 and Pinghui Wang3,1 1MOE NSKEY Lab, Xi an Jiaotong University, China 2Huawei Noah s Ark Lab 3Shenzhen Research School, Xi an Jiaotong University, China 4Department of Automation and NLIST Lab, Tsinghua University, China llan@sei.xjtu.edu.cn, li.zhenguo@huawei.com, {xhguan, phwang}@mail.xjtu.edu.cn
Pseudocode Yes Algorithm 1 Training Procedure of TESP
Open Source Code Yes 1Code available at https://github.com/llan-ml/tesp.
Open Datasets No The paper states tasks are sampled within the MuJoCo simulator, and refers to generated tasks ("we sample 100 target locations... as training tasks D"). It does not provide access information for a pre-existing public dataset.
Dataset Splits No The paper discusses training and testing on different sets of sampled tasks (D, D', D'') and performing evaluations, but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing as one would for a fixed dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments, only mentioning the use of the MuJoCo simulator.
Software Dependencies No The paper mentions software components like MuJoCo simulator, VPG, and PPO, but it does not specify version numbers for these or any other software dependencies.
Experiment Setup No The paper mentions general experimental settings such as setting K to 3 and using VPG for fast-update and PPO for meta-update, but it explicitly defers 'detailed settings of environments and experiments' to a supplementary resource at a provided GitHub link, rather than including them in the main text.