Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ProMP: Proximal Meta-Policy Search
Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we show that Pro MP consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.In order to empirically validate the theoretical arguments outlined above, this section provides a detailed experimental analysis that aims to answer the following questions: (i) How does Pro MP perform against previous Meta-RL algorithms? (ii) How do the lower variance but biased LVC gradient estimates compare to the high variance, unbiased Di CE estimates? (iii) Do the different formulations result in different pre-update exploration properties? (iv) How do formulation I and formulation II differ in their meta-gradient estimates and convergence properties? |
| Researcher Affiliation | Collaboration | Jonas Rothfuss UC Berkeley, KIT EMAIL; Dennis Lee , Ignasi Clavera UC Berkeley EMAIL; Tamim Asfour Karlsruhe Inst. of Technology (KIT) EMAIL; Pieter Abbeel UC Berkeley, Covariant.ai EMAIL |
| Pseudocode | Yes | Algorithm 1 Proximal Meta-Policy Search (Pro MP) |
| Open Source Code | Yes | The source code and the experiment data are available on our supplementary website.2 |
| Open Datasets | Yes | To answer the posed questions, we evaluate our approach on six continuous control Meta-RL benchmark environments based on Open AI Gym and the Mujoco simulator (Brockman et al., 2016; Todorov et al., 2012). |
| Dataset Splits | No | The paper uses continuous control Meta-RL benchmark environments (OpenAI Gym, Mujoco) where the concept of predefined train/validation/test dataset splits is not explicitly applicable as in supervised learning. It does not specify any fixed dataset splits for these environments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | Table 1 contains the hyperparameter settings used for the different algorithms. Any environment specific modifications are noted in the respective paragraph describing the environment. |