Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ProMP: Proximal Meta-Policy Search
Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we show that Pro MP consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.In order to empirically validate the theoretical arguments outlined above, this section provides a detailed experimental analysis that aims to answer the following questions: (i) How does Pro MP perform against previous Meta-RL algorithms? (ii) How do the lower variance but biased LVC gradient estimates compare to the high variance, unbiased Di CE estimates? (iii) Do the different formulations result in different pre-update exploration properties? (iv) How do formulation I and formulation II differ in their meta-gradient estimates and convergence properties? |
| Researcher Affiliation | Collaboration | Jonas Rothfuss UC Berkeley, KIT EMAIL; Dennis Lee , Ignasi Clavera UC Berkeley EMAIL; Tamim Asfour Karlsruhe Inst. of Technology (KIT) EMAIL; Pieter Abbeel UC Berkeley, Covariant.ai EMAIL |
| Pseudocode | Yes | Algorithm 1 Proximal Meta-Policy Search (Pro MP) |
| Open Source Code | Yes | The source code and the experiment data are available on our supplementary website.2 |
| Open Datasets | Yes | To answer the posed questions, we evaluate our approach on six continuous control Meta-RL benchmark environments based on Open AI Gym and the Mujoco simulator (Brockman et al., 2016; Todorov et al., 2012). |
| Dataset Splits | No | The paper uses continuous control Meta-RL benchmark environments (OpenAI Gym, Mujoco) where the concept of predefined train/validation/test dataset splits is not explicitly applicable as in supervised learning. It does not specify any fixed dataset splits for these environments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | Table 1 contains the hyperparameter settings used for the different algorithms. Any environment specific modifications are noted in the respective paragraph describing the environment. |