Multi-Task Reinforcement Learning with Context-based Representations
Authors: Shagun Sodhani, Amy Zhang, Joelle Pineau
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks. and 4. Experiments We now empirically evaluate the effectiveness of the proposed CARE model on Meta-World (Yu et al., 2020b) a multi-task RL benchmark with 50 tasks. |
| Researcher Affiliation | Collaboration | Shagun Sodhani 1 Amy Zhang 1 2 3 Joelle Pineau 1 2 3 1Facebook AI Research 2Mila 3Mc Gill University. |
| Pseudocode | Yes | The overall algorithm is described in Algorithm 1, with the sub-function details available in the Appendix (Algorithms 2, 3, 4). |
| Open Source Code | Yes | The implementation of the algorithms is available at https://github.com/facebookresearch/mtrl. |
| Open Datasets | Yes | We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks. and This brings us to our setting for evaluation, Meta World (Yu et al., 2020b)3, as a natural instantiation of a BC-MDP. |
| Dataset Splits | No | The paper uses the Meta-World benchmark which consists of distinct tasks, and experiments are conducted on these tasks (e.g., MT10, MT50 setups), but it does not explicitly detail the train/validation/test dataset splits for the data within these tasks, nor does it refer to predefined splits from the Meta-World benchmark paper with a direct citation for the split methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as CPU/GPU models, memory, or specific cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Roberta model', 'Soft Actor-Critic (SAC)', 'Pytorch', and 'NumPy', but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The overall algorithm is described in Algorithm 1, with the sub-function details available in the Appendix (Algorithms 2, 3, 4). The architecture diagram is shown in Figure 3 and the Appendix contains additional implementation details (Appendix A) and hyper-parameters (Appendix B). and The agent is trained for multiple seeds (10 in our case)... and evaluating every agent at a fixed frequency (once every 10K environment steps, per task). and The agent is trained for multiple seeds (10 in our case) resulting in 10 different time-series (one per seed) of mean success rates. and after training for 2 million steps (for each environment). and after training for 100 thousand steps (for each environment). |