reproducibilityindex.ai

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang, Alois Knoll

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Mo SS on Mu Jo Co (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks, including various robotic control and manipulation tasks. Mo SS shows state-of-the-art results in asymptotic performance, sample and adaptation efficiency, and generalization robustness.
Researcher Affiliation	Collaboration	Mingyang Wang1, Zhenshan Bing1, Xiangtong Yao1, Shuai Wang2, Huang Kai3, 4, Hang Su5, Chenguang Yang6, *Alois Knoll1 1Department of Informatics, Technical University Munich, 2Tencent Robotics X Lab, 3School of Computer Science and Engineering, Sun Yat-Sen University, 4Shenzhen Institute, Sun Yat-Sen University 5Dipartimento di Elettronica, Politecnico di Milano, 6Bristol Robotics Laboratory, University of the West of England
Pseudocode	Yes	We also summarize the meta-training procedure of Mo SS as pseudo-code in Algorithm 1.
Open Source Code	Yes	1Implementation and videos available at https://sites.google. com/view/metarl-moss
Open Datasets	Yes	We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks.
Dataset Splits	Yes	We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks. Specifically, during meta-training, the algorithm has access to Ntrain tasks drawn from the task distribution p(M). At meta-test time, new tasks are also sampled from p(M). For example, in Cheetah-Vel-OOD, we train the agent on the velocity range of [2.0, 4.0] and test it on [1.0, 2.0] [4.0, 5.0].
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided in the paper.
Software Dependencies	No	No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x), were provided in the paper.
Experiment Setup	No	The paper states, 'Other hyperparameters can be found in Appendix.', thus not providing specific experimental setup details within the main text.