Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

Authors: Evan Z Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. ... Empirically, we stress test DREAM s ability to learn sophisticated exploration strategies on 3 challenging, didactic benchmarks and a sparse-reward 3D visual navigation benchmark. On these, DREAM learns to optimally explore and exploit, achieving 90% higher returns than existing state-of-the-art approaches (PEARL, E-RL2, IMPORT, VARIBAD), which struggle to learn an effective exploration strategy (Section 6).
Researcher Affiliation Collaboration Evan Zheran Liu 1 Aditi Raghunathan 1 Percy Liang 1 Chelsea Finn 1 1Department of Computer Science, Stanford University. Correspondence to: Evan Zheran Liu <evanliu@cs.stanford.edu>. ... This work was also supported in part by Google.
Pseudocode Yes Algorithm 1 DREAM meta-training trial
Open Source Code Yes Our code is publicly available at https: //github.com/ezliu/dream.
Open Datasets Yes We design new benchmarks meeting the above criteria, testing (i-iii) with didactic benchmarks, and (iv) with a sparse-reward 3D visual navigation benchmark, based on Kamienny et al. (2020), that combines complex exploration with high-dimensional visual inputs. ... Chevalier-Boisvert, M. Gym-Miniworld environment for openai gym. https://github.com/maximecb/ gym-miniworld, 2018.
Dataset Splits Yes We evaluate each approach on 100 meta-testing trials, every 2K meta-training trials. ... We hold out 1 of the 43 = 64 problems for meta-testing.
Hardware Specification No The information is insufficient. The paper does not explicitly mention any specific hardware components (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The information is insufficient. While the paper refers to implementing policies with PyTorch (Paszke et al., 2017) in its references, it does not explicitly list any software dependencies with specific version numbers within the main text or appendices for reproducibility.
Experiment Setup Yes We report the average returns achieved by each approach in trials with one exploration and one exploitation episode, averaged over 3 seeds with 1-standard deviation error bars (full details in Appendix B.3).