Guided Meta-Policy Search
Authors: Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations. |
| Researcher Affiliation | Academia | Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn Department of Electrical Engineering and Computer Science University of California, Berkeley {russellm, cbfinn}@berkeley.edu {abhigupta, pabbeel, svlevine}@eecs.berkeley.edu rdkralev@gmail.com |
| Pseudocode | Yes | Algorithm 1 GMPS: Guided Meta-Policy Search |
| Open Source Code | No | The paper mentions 'Videos of our results are available online 1. The website is at https://sites.google.com/berkeley.edu/guided-metapolicy-search/home' which is a project homepage for results videos, not an explicit statement or link for source code availability. |
| Open Datasets | Yes | This environment uses the ant environment in Open AI gym [3]. [3] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2017. |
| Dataset Splits | Yes | During meta-training, a task T is sampled, along with data from that task, which is randomly partitioned into two sets, Dtr and Dval. MAML optimizes for a set of model parameters θ such that one or a few gradient steps on Dtr produces good performance on Dval. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU, CPU models, or cloud computing instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI gym' and 'soft-actor critic (SAC)', but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | Further details such as the reward functions for all environments, network architectures, and hyperparameters swept over are in the appendix. |