ProMP: Proximal Meta-Policy Search
Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we show that Pro MP consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.In order to empirically validate the theoretical arguments outlined above, this section provides a detailed experimental analysis that aims to answer the following questions: (i) How does Pro MP perform against previous Meta-RL algorithms? (ii) How do the lower variance but biased LVC gradient estimates compare to the high variance, unbiased Di CE estimates? (iii) Do the different formulations result in different pre-update exploration properties? (iv) How do formulation I and formulation II differ in their meta-gradient estimates and convergence properties? |
| Researcher Affiliation | Collaboration | Jonas Rothfuss UC Berkeley, KIT jonas.rothfuss@gmail.com; Dennis Lee , Ignasi Clavera UC Berkeley {dennisl88,iclavera}@berkeley.edu; Tamim Asfour Karlsruhe Inst. of Technology (KIT) asfour@kit.edu; Pieter Abbeel UC Berkeley, Covariant.ai pabbeel@cs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Proximal Meta-Policy Search (Pro MP) |
| Open Source Code | Yes | The source code and the experiment data are available on our supplementary website.2 |
| Open Datasets | Yes | To answer the posed questions, we evaluate our approach on six continuous control Meta-RL benchmark environments based on Open AI Gym and the Mujoco simulator (Brockman et al., 2016; Todorov et al., 2012). |
| Dataset Splits | No | The paper uses continuous control Meta-RL benchmark environments (OpenAI Gym, Mujoco) where the concept of predefined train/validation/test dataset splits is not explicitly applicable as in supervised learning. It does not specify any fixed dataset splits for these environments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | Table 1 contains the hyperparameter settings used for the different algorithms. Any environment specific modifications are noted in the respective paragraph describing the environment. |