reproducibilityindex.ai

Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

Authors: Evan Z Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. ... Empirically, we stress test DREAM s ability to learn sophisticated exploration strategies on 3 challenging, didactic benchmarks and a sparse-reward 3D visual navigation benchmark. On these, DREAM learns to optimally explore and exploit, achieving 90% higher returns than existing state-of-the-art approaches (PEARL, E-RL2, IMPORT, VARIBAD), which struggle to learn an effective exploration strategy (Section 6).
Researcher Affiliation	Collaboration	Evan Zheran Liu 1 Aditi Raghunathan 1 Percy Liang 1 Chelsea Finn 1 1Department of Computer Science, Stanford University. Correspondence to: Evan Zheran Liu <evanliu@cs.stanford.edu>. ... This work was also supported in part by Google.
Pseudocode	Yes	Algorithm 1 DREAM meta-training trial
Open Source Code	Yes	Our code is publicly available at https: //github.com/ezliu/dream.
Open Datasets	Yes	We design new benchmarks meeting the above criteria, testing (i-iii) with didactic benchmarks, and (iv) with a sparse-reward 3D visual navigation benchmark, based on Kamienny et al. (2020), that combines complex exploration with high-dimensional visual inputs. ... Chevalier-Boisvert, M. Gym-Miniworld environment for openai gym. https://github.com/maximecb/ gym-miniworld, 2018.
Dataset Splits	Yes	We evaluate each approach on 100 meta-testing trials, every 2K meta-training trials. ... We hold out 1 of the 43 = 64 problems for meta-testing.
Hardware Specification	No	The information is insufficient. The paper does not explicitly mention any specific hardware components (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The information is insufficient. While the paper refers to implementing policies with PyTorch (Paszke et al., 2017) in its references, it does not explicitly list any software dependencies with specific version numbers within the main text or appendices for reproducibility.
Experiment Setup	Yes	We report the average returns achieved by each approach in trials with one exploration and one exploitation episode, averaged over 3 seeds with 1-standard deviation error bars (full details in Appendix B.3).