Guided Meta-Policy Search

Authors: Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.
Researcher Affiliation Academia Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn Department of Electrical Engineering and Computer Science University of California, Berkeley {russellm, cbfinn}@berkeley.edu {abhigupta, pabbeel, svlevine}@eecs.berkeley.edu rdkralev@gmail.com
Pseudocode Yes Algorithm 1 GMPS: Guided Meta-Policy Search
Open Source Code No The paper mentions 'Videos of our results are available online 1. The website is at https://sites.google.com/berkeley.edu/guided-metapolicy-search/home' which is a project homepage for results videos, not an explicit statement or link for source code availability.
Open Datasets Yes This environment uses the ant environment in Open AI gym [3]. [3] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2017.
Dataset Splits Yes During meta-training, a task T is sampled, along with data from that task, which is randomly partitioned into two sets, Dtr and Dval. MAML optimizes for a set of model parameters θ such that one or a few gradient steps on Dtr produces good performance on Dval.
Hardware Specification No The paper does not specify any particular hardware components (e.g., GPU, CPU models, or cloud computing instance types) used for running the experiments.
Software Dependencies No The paper mentions software like 'Open AI gym' and 'soft-actor critic (SAC)', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup No Further details such as the reward functions for all environments, network architectures, and hyperparameters swept over are in the appendix.