reproducibilityindex.ai

The Importance of Sampling inMeta-Reinforcement Learning

Authors: Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results are presented on a new environment we call Krazy World : a difﬁcult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-RL2 deliver better performance than baseline algorithms on both tasks.
Researcher Affiliation	Collaboration	Bradly C. Stadie UC Berkeley Ge Yang University of Chicago Rein Houthooft Open AI Xi Chen Covariant.ai Yan Duan Covariant.ai Yuhuai Wu University of Toronto Pieter Abbeel UC Berkeley Ilya Sutskever Open AI
Pseudocode	Yes	Algorithm 1 E-MAML
Open Source Code	Yes	Code for Krazy World available at: https://github.com/bstadie/krazyworld Code for meta RL algorithms available at: https://github.com/episodeyang/e-maml
Open Datasets	No	The paper introduces "Krazy World" as a new environment and refers to "a set of maze environments" without providing concrete access information (specific links, DOIs, or citations to pre-existing public datasets) for the data used in the experiments. It provides code to generate the Krazy World environment, but not a pre-collected dataset from it or the maze environments.
Dataset Splits	No	The paper states: "First, we initialize 32 training environments and 64 test environments." and "These test environment results are recorded, 32 new tasks are sampled from the training environments, and data collection begins again." However, it does not explicitly mention a validation set or provide specific details about its split.
Hardware Specification	No	The paper mentions support from "AWS and GCE compute credits" in the acknowledgements, but does not provide specific details on the hardware used, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions using algorithms like PPO/VPG/CPI and refers to "Q-Learning algorithm from Open-AI baselines" but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Table 1: Table of Hyperparameters: E-MAML lists numerous specific hyperparameters such as 'PPO Clip Range 0.2', 'Gamma 0 1', 'GAE Lambda 0.997', 'Alpha 0.01', 'Beta 1e-3 60', 'Vf Coeff 0', 'Ent Coeff 1e-3', 'Inner Optimizer SGD', 'Meta Optimizer Adam', 'Inner Gradient Steps 1 20', and 'Meta Gradient Steps 1 20'.