The Importance of Sampling inMeta-Reinforcement Learning
Authors: Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results are presented on a new environment we call Krazy World : a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-RL2 deliver better performance than baseline algorithms on both tasks. |
| Researcher Affiliation | Collaboration | Bradly C. Stadie UC Berkeley Ge Yang University of Chicago Rein Houthooft Open AI Xi Chen Covariant.ai Yan Duan Covariant.ai Yuhuai Wu University of Toronto Pieter Abbeel UC Berkeley Ilya Sutskever Open AI |
| Pseudocode | Yes | Algorithm 1 E-MAML |
| Open Source Code | Yes | Code for Krazy World available at: https://github.com/bstadie/krazyworld Code for meta RL algorithms available at: https://github.com/episodeyang/e-maml |
| Open Datasets | No | The paper introduces "Krazy World" as a new environment and refers to "a set of maze environments" without providing concrete access information (specific links, DOIs, or citations to pre-existing public datasets) for the data used in the experiments. It provides code to generate the Krazy World environment, but not a pre-collected dataset from it or the maze environments. |
| Dataset Splits | No | The paper states: "First, we initialize 32 training environments and 64 test environments." and "These test environment results are recorded, 32 new tasks are sampled from the training environments, and data collection begins again." However, it does not explicitly mention a validation set or provide specific details about its split. |
| Hardware Specification | No | The paper mentions support from "AWS and GCE compute credits" in the acknowledgements, but does not provide specific details on the hardware used, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using algorithms like PPO/VPG/CPI and refers to "Q-Learning algorithm from Open-AI baselines" but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Table 1: Table of Hyperparameters: E-MAML lists numerous specific hyperparameters such as 'PPO Clip Range 0.2', 'Gamma 0 1', 'GAE Lambda 0.997', 'Alpha 0.01', 'Beta 1e-3 60', 'Vf Coeff 0', 'Ent Coeff 1e-3', 'Inner Optimizer SGD', 'Meta Optimizer Adam', 'Inner Gradient Steps 1 20', and 'Meta Gradient Steps 1 20'. |