Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Task Reinforcement Learning with Context-based Representations
Authors: Shagun Sodhani, Amy Zhang, Joelle Pineau
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks. and 4. Experiments We now empirically evaluate the effectiveness of the proposed CARE model on Meta-World (Yu et al., 2020b) a multi-task RL benchmark with 50 tasks. |
| Researcher Affiliation | Collaboration | Shagun Sodhani 1 Amy Zhang 1 2 3 Joelle Pineau 1 2 3 1Facebook AI Research 2Mila 3Mc Gill University. |
| Pseudocode | Yes | The overall algorithm is described in Algorithm 1, with the sub-function details available in the Appendix (Algorithms 2, 3, 4). |
| Open Source Code | Yes | The implementation of the algorithms is available at https://github.com/facebookresearch/mtrl. |
| Open Datasets | Yes | We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks. and This brings us to our setting for evaluation, Meta World (Yu et al., 2020b)3, as a natural instantiation of a BC-MDP. |
| Dataset Splits | No | The paper uses the Meta-World benchmark which consists of distinct tasks, and experiments are conducted on these tasks (e.g., MT10, MT50 setups), but it does not explicitly detail the train/validation/test dataset splits for the data within these tasks, nor does it refer to predefined splits from the Meta-World benchmark paper with a direct citation for the split methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as CPU/GPU models, memory, or specific cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Roberta model', 'Soft Actor-Critic (SAC)', 'Pytorch', and 'NumPy', but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The overall algorithm is described in Algorithm 1, with the sub-function details available in the Appendix (Algorithms 2, 3, 4). The architecture diagram is shown in Figure 3 and the Appendix contains additional implementation details (Appendix A) and hyper-parameters (Appendix B). and The agent is trained for multiple seeds (10 in our case)... and evaluating every agent at a fixed frequency (once every 10K environment steps, per task). and The agent is trained for multiple seeds (10 in our case) resulting in 10 different time-series (one per seed) of mean success rates. and after training for 2 million steps (for each environment). and after training for 100 thousand steps (for each environment). |