reproducibilityindex.ai

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Authors: Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluation shows that our meta-RL method signiﬁcantly outperforms state-of-the-art baselines on various sparse-reward Mu Jo Co locomotion tasks and more complex sparse-World tasks.Meta CURE is extensively evaluated on various sparse-reward Mu Jo Co locomotion tasks as well as sparse-reward Meta-World tasks. Empirical results show that it outperforms baseline algorithms by a large margin.
Researcher Affiliation	Collaboration	1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Fuxi AI Lab, Net Ease, China.
Pseudocode	Yes	Pseudo-codes for meta-training and adaptation are available in Algorithm 1 and Algorithm 2, respectively.
Open Source Code	Yes	Our implementation codes are available at https: //github.com/Nagisa Zj/Meta CURE-Public.
Open Datasets	Yes	These tasks (except for Point-Robot-Sparse) are simulated via Mu-Jo Co (Todorov et al., 2012) and are benchmarks commonly used by current meta-learning algorithms (Mishra et al., 2018; Finn et al., 2017; Rothfuss et al., 2019; Rakelly et al., 2019). We evaluate Meta CURE as well as baselines on two Meta-World task sets: Reach and Reach-Wall.
Dataset Splits	No	The paper refers to 'meta-training tasks' and 'meta-testing tasks' and states that 'Detailed parameters and reward function settings are deferred to Appendix C', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts of data points) for reproduction.
Hardware Specification	No	The paper mentions evaluating on MuJoCo and Meta-World tasks, implying computational resources were used, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using SAC (Soft Actor-Critic) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks used in the implementation.
Experiment Setup	No	The paper states 'Detailed parameters and reward function settings are deferred to Appendix C' and 'Additional implementation details are deferred to Appendix B' without providing specific hyperparameters or system-level training settings in the main body of the paper.