Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang, Alois Knoll
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Mo SS on Mu Jo Co (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks, including various robotic control and manipulation tasks. Mo SS shows state-of-the-art results in asymptotic performance, sample and adaptation efficiency, and generalization robustness. |
| Researcher Affiliation | Collaboration | Mingyang Wang1, Zhenshan Bing1, Xiangtong Yao1, Shuai Wang2, Huang Kai3, 4, Hang Su5, Chenguang Yang6, *Alois Knoll1 1Department of Informatics, Technical University Munich, 2Tencent Robotics X Lab, 3School of Computer Science and Engineering, Sun Yat-Sen University, 4Shenzhen Institute, Sun Yat-Sen University 5Dipartimento di Elettronica, Politecnico di Milano, 6Bristol Robotics Laboratory, University of the West of England |
| Pseudocode | Yes | We also summarize the meta-training procedure of Mo SS as pseudo-code in Algorithm 1. |
| Open Source Code | Yes | 1Implementation and videos available at https://sites.google. com/view/metarl-moss |
| Open Datasets | Yes | We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks. |
| Dataset Splits | Yes | We evaluate the performance of Mo SS on Mujoco (Todorov, Erez, and Tassa 2012) and Meta-World (Yu et al. 2020) benchmarks. Specifically, during meta-training, the algorithm has access to Ntrain tasks drawn from the task distribution p(M). At meta-test time, new tasks are also sampled from p(M). For example, in Cheetah-Vel-OOD, we train the agent on the velocity range of [2.0, 4.0] and test it on [1.0, 2.0] [4.0, 5.0]. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided in the paper. |
| Software Dependencies | No | No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x), were provided in the paper. |
| Experiment Setup | No | The paper states, 'Other hyperparameters can be found in Appendix.', thus not providing specific experimental setup details within the main text. |