MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Authors: Kevin Li, Abhishek Gupta, Ashwin Reddy, Vitchyr H Pong, Aurick Zhou, Justin Yu, Sergey Levine

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experimental Evaluation In our experimental evaluation we aim to answer the following questions: (1) Can MURAL make effective use of successful outcome examples to solve challenging exploration tasks? (2) Does MURAL scale to dynamically complex tasks? (3) What are the impacts of different design decisions on the effectiveness of MURAL? Further details, videos, and code can be found at https://sites.google.com/view/mural-rl
Researcher Affiliation Academia 1Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, USA. Correspondence to: Kevin Li <kevintli@berkeley.edu>, Abhishek Gupta <abhigupta@berkeley.edu>.
Pseudocode Yes Algorithm 1 RL with CNML-Based Success Classifiers" and "Algorithm 2 MURAL: Meta-learning Uncertainty-aware Rewards for Automated Outcome-driven RL
Open Source Code Yes Further details, videos, and code can be found at https://sites.google.com/view/mural-rl
Open Datasets No The paper describes various environments and tasks (e.g., maze navigation, robotic manipulation with Sawyer robot, quadruped ant locomotion) rather than specific, named public datasets with explicit access information (link, DOI, citation with author/year). The problem is framed as 'outcome-driven RL' where successful outcomes are provided by the user and on-policy samples are collected, not a fixed pre-existing dataset.
Dataset Splits No The paper does not explicitly provide specific percentages, sample counts, or citations for train/validation/test dataset splits needed to reproduce the experiment. It describes how data is sampled during the reinforcement learning process for the classifier but not a fixed, reproducible dataset partition.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper states, 'Further details are in Appendix A.2' and 'More details are included in Appendix A.4 and A.6' for experimental setup, but these appendices are not provided in the main text to give specific hyperparameter values or training configurations.