MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Authors: Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward Mu Jo Co locomotion tasks and more complex sparse-World tasks.Meta CURE is extensively evaluated on various sparse-reward Mu Jo Co locomotion tasks as well as sparse-reward Meta-World tasks. Empirical results show that it outperforms baseline algorithms by a large margin.
Researcher Affiliation Collaboration 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Fuxi AI Lab, Net Ease, China.
Pseudocode Yes Pseudo-codes for meta-training and adaptation are available in Algorithm 1 and Algorithm 2, respectively.
Open Source Code Yes Our implementation codes are available at https: //github.com/Nagisa Zj/Meta CURE-Public.
Open Datasets Yes These tasks (except for Point-Robot-Sparse) are simulated via Mu-Jo Co (Todorov et al., 2012) and are benchmarks commonly used by current meta-learning algorithms (Mishra et al., 2018; Finn et al., 2017; Rothfuss et al., 2019; Rakelly et al., 2019). We evaluate Meta CURE as well as baselines on two Meta-World task sets: Reach and Reach-Wall.
Dataset Splits No The paper refers to 'meta-training tasks' and 'meta-testing tasks' and states that 'Detailed parameters and reward function settings are deferred to Appendix C', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts of data points) for reproduction.
Hardware Specification No The paper mentions evaluating on MuJoCo and Meta-World tasks, implying computational resources were used, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using SAC (Soft Actor-Critic) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks used in the implementation.
Experiment Setup No The paper states 'Detailed parameters and reward function settings are deferred to Appendix C' and 'Additional implementation details are deferred to Appendix B' without providing specific hyperparameters or system-level training settings in the main body of the paper.