Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang, Alois Knoll

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Mo SS on Mu Jo Co (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks, including various robotic control and manipulation tasks. Mo SS shows state-of-the-art results in asymptotic performance, sample and adaptation efficiency, and generalization robustness.
Researcher Affiliation Collaboration Mingyang Wang1, Zhenshan Bing1, Xiangtong Yao1, Shuai Wang2, Huang Kai3, 4, Hang Su5, Chenguang Yang6, *Alois Knoll1 1Department of Informatics, Technical University Munich, 2Tencent Robotics X Lab, 3School of Computer Science and Engineering, Sun Yat-Sen University, 4Shenzhen Institute, Sun Yat-Sen University 5Dipartimento di Elettronica, Politecnico di Milano, 6Bristol Robotics Laboratory, University of the West of England
Pseudocode Yes We also summarize the meta-training procedure of Mo SS as pseudo-code in Algorithm 1.
Open Source Code Yes 1Implementation and videos available at https://sites.google. com/view/metarl-moss
Open Datasets Yes We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks.
Dataset Splits Yes We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks. Specifically, during meta-training, the algorithm has access to Ntrain tasks drawn from the task distribution p(M). At meta-test time, new tasks are also sampled from p(M). For example, in Cheetah-Vel-OOD, we train the agent on the velocity range of [2.0, 4.0] and test it on [1.0, 2.0] [4.0, 5.0].
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided in the paper.
Software Dependencies No No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x), were provided in the paper.
Experiment Setup No The paper states, 'Other hyperparameters can be found in Appendix.', thus not providing specific experimental setup details within the main text.