Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
Authors: Evan Z Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. ... Empirically, we stress test DREAM s ability to learn sophisticated exploration strategies on 3 challenging, didactic benchmarks and a sparse-reward 3D visual navigation benchmark. On these, DREAM learns to optimally explore and exploit, achieving 90% higher returns than existing state-of-the-art approaches (PEARL, E-RL2, IMPORT, VARIBAD), which struggle to learn an effective exploration strategy (Section 6). |
| Researcher Affiliation | Collaboration | Evan Zheran Liu 1 Aditi Raghunathan 1 Percy Liang 1 Chelsea Finn 1 1Department of Computer Science, Stanford University. Correspondence to: Evan Zheran Liu <evanliu@cs.stanford.edu>. ... This work was also supported in part by Google. |
| Pseudocode | Yes | Algorithm 1 DREAM meta-training trial |
| Open Source Code | Yes | Our code is publicly available at https: //github.com/ezliu/dream. |
| Open Datasets | Yes | We design new benchmarks meeting the above criteria, testing (i-iii) with didactic benchmarks, and (iv) with a sparse-reward 3D visual navigation benchmark, based on Kamienny et al. (2020), that combines complex exploration with high-dimensional visual inputs. ... Chevalier-Boisvert, M. Gym-Miniworld environment for openai gym. https://github.com/maximecb/ gym-miniworld, 2018. |
| Dataset Splits | Yes | We evaluate each approach on 100 meta-testing trials, every 2K meta-training trials. ... We hold out 1 of the 43 = 64 problems for meta-testing. |
| Hardware Specification | No | The information is insufficient. The paper does not explicitly mention any specific hardware components (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The information is insufficient. While the paper refers to implementing policies with PyTorch (Paszke et al., 2017) in its references, it does not explicitly list any software dependencies with specific version numbers within the main text or appendices for reproducibility. |
| Experiment Setup | Yes | We report the average returns achieved by each approach in trials with one exploration and one exploitation episode, averaged over 3 seeds with 1-standard deviation error bars (full details in Appendix B.3). |