MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning
Authors: Kevin Li, Abhishek Gupta, Ashwin Reddy, Vitchyr H Pong, Aurick Zhou, Justin Yu, Sergey Levine
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experimental Evaluation In our experimental evaluation we aim to answer the following questions: (1) Can MURAL make effective use of successful outcome examples to solve challenging exploration tasks? (2) Does MURAL scale to dynamically complex tasks? (3) What are the impacts of different design decisions on the effectiveness of MURAL? Further details, videos, and code can be found at https://sites.google.com/view/mural-rl |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, USA. Correspondence to: Kevin Li <kevintli@berkeley.edu>, Abhishek Gupta <abhigupta@berkeley.edu>. |
| Pseudocode | Yes | Algorithm 1 RL with CNML-Based Success Classifiers" and "Algorithm 2 MURAL: Meta-learning Uncertainty-aware Rewards for Automated Outcome-driven RL |
| Open Source Code | Yes | Further details, videos, and code can be found at https://sites.google.com/view/mural-rl |
| Open Datasets | No | The paper describes various environments and tasks (e.g., maze navigation, robotic manipulation with Sawyer robot, quadruped ant locomotion) rather than specific, named public datasets with explicit access information (link, DOI, citation with author/year). The problem is framed as 'outcome-driven RL' where successful outcomes are provided by the user and on-policy samples are collected, not a fixed pre-existing dataset. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages, sample counts, or citations for train/validation/test dataset splits needed to reproduce the experiment. It describes how data is sampled during the reinforcement learning process for the classifier but not a fixed, reproducible dataset partition. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | No | The paper states, 'Further details are in Appendix A.2' and 'More details are included in Appendix A.4 and A.6' for experimental setup, but these appendices are not provided in the main text to give specific hyperparameter values or training configurations. |