Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Guided Meta-Policy Search
Authors: Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations. |
| Researcher Affiliation | Academia | Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn Department of Electrical Engineering and Computer Science University of California, Berkeley EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 GMPS: Guided Meta-Policy Search |
| Open Source Code | No | The paper mentions 'Videos of our results are available online 1. The website is at https://sites.google.com/berkeley.edu/guided-metapolicy-search/home' which is a project homepage for results videos, not an explicit statement or link for source code availability. |
| Open Datasets | Yes | This environment uses the ant environment in Open AI gym [3]. [3] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2017. |
| Dataset Splits | Yes | During meta-training, a task T is sampled, along with data from that task, which is randomly partitioned into two sets, Dtr and Dval. MAML optimizes for a set of model parameters θ such that one or a few gradient steps on Dtr produces good performance on Dval. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU, CPU models, or cloud computing instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI gym' and 'soft-actor critic (SAC)', but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | Further details such as the reward functions for all environments, network architectures, and hyperparameters swept over are in the appendix. |