Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang, Alois Knoll
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Mo SS on Mu Jo Co (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks, including various robotic control and manipulation tasks. Mo SS shows state-of-the-art results in asymptotic performance, sample and adaptation efficiency, and generalization robustness. |
| Researcher Affiliation | Collaboration | Mingyang Wang1, Zhenshan Bing1, Xiangtong Yao1, Shuai Wang2, Huang Kai3, 4, Hang Su5, Chenguang Yang6, *Alois Knoll1 1Department of Informatics, Technical University Munich, 2Tencent Robotics X Lab, 3School of Computer Science and Engineering, Sun Yat-Sen University, 4Shenzhen Institute, Sun Yat-Sen University 5Dipartimento di Elettronica, Politecnico di Milano, 6Bristol Robotics Laboratory, University of the West of England |
| Pseudocode | Yes | We also summarize the meta-training procedure of Mo SS as pseudo-code in Algorithm 1. |
| Open Source Code | Yes | 1Implementation and videos available at https://sites.google. com/view/metarl-moss |
| Open Datasets | Yes | We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks. |
| Dataset Splits | Yes | We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks. Specifically, during meta-training, the algorithm has access to Ntrain tasks drawn from the task distribution p(M). At meta-test time, new tasks are also sampled from p(M). For example, in Cheetah-Vel-OOD, we train the agent on the velocity range of [2.0, 4.0] and test it on [1.0, 2.0] [4.0, 5.0]. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided in the paper. |
| Software Dependencies | No | No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x), were provided in the paper. |
| Experiment Setup | No | The paper states, 'Other hyperparameters can be found in Appendix.', thus not providing specific experimental setup details within the main text. |