Multi-Task Reinforcement Learning with Context-based Representations

Authors: Shagun Sodhani, Amy Zhang, Joelle Pineau

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks. and 4. Experiments We now empirically evaluate the effectiveness of the proposed CARE model on Meta-World (Yu et al., 2020b) a multi-task RL benchmark with 50 tasks.
Researcher Affiliation Collaboration Shagun Sodhani 1 Amy Zhang 1 2 3 Joelle Pineau 1 2 3 1Facebook AI Research 2Mila 3Mc Gill University.
Pseudocode Yes The overall algorithm is described in Algorithm 1, with the sub-function details available in the Appendix (Algorithms 2, 3, 4).
Open Source Code Yes The implementation of the algorithms is available at https://github.com/facebookresearch/mtrl.
Open Datasets Yes We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks. and This brings us to our setting for evaluation, Meta World (Yu et al., 2020b)3, as a natural instantiation of a BC-MDP.
Dataset Splits No The paper uses the Meta-World benchmark which consists of distinct tasks, and experiments are conducted on these tasks (e.g., MT10, MT50 setups), but it does not explicitly detail the train/validation/test dataset splits for the data within these tasks, nor does it refer to predefined splits from the Meta-World benchmark paper with a direct citation for the split methodology.
Hardware Specification No The paper does not provide specific hardware details (such as CPU/GPU models, memory, or specific cloud instances) used for running the experiments.
Software Dependencies No The paper mentions software like 'Roberta model', 'Soft Actor-Critic (SAC)', 'Pytorch', and 'NumPy', but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes The overall algorithm is described in Algorithm 1, with the sub-function details available in the Appendix (Algorithms 2, 3, 4). The architecture diagram is shown in Figure 3 and the Appendix contains additional implementation details (Appendix A) and hyper-parameters (Appendix B). and The agent is trained for multiple seeds (10 in our case)... and evaluating every agent at a fixed frequency (once every 10K environment steps, per task). and The agent is trained for multiple seeds (10 in our case) resulting in 10 different time-series (one per seed) of mean success rates. and after training for 2 million steps (for each environment). and after training for 100 thousand steps (for each environment).