ProMP: Proximal Meta-Policy Search

Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that Pro MP consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.In order to empirically validate the theoretical arguments outlined above, this section provides a detailed experimental analysis that aims to answer the following questions: (i) How does Pro MP perform against previous Meta-RL algorithms? (ii) How do the lower variance but biased LVC gradient estimates compare to the high variance, unbiased Di CE estimates? (iii) Do the different formulations result in different pre-update exploration properties? (iv) How do formulation I and formulation II differ in their meta-gradient estimates and convergence properties?
Researcher Affiliation Collaboration Jonas Rothfuss UC Berkeley, KIT jonas.rothfuss@gmail.com; Dennis Lee , Ignasi Clavera UC Berkeley {dennisl88,iclavera}@berkeley.edu; Tamim Asfour Karlsruhe Inst. of Technology (KIT) asfour@kit.edu; Pieter Abbeel UC Berkeley, Covariant.ai pabbeel@cs.berkeley.edu
Pseudocode Yes Algorithm 1 Proximal Meta-Policy Search (Pro MP)
Open Source Code Yes The source code and the experiment data are available on our supplementary website.2
Open Datasets Yes To answer the posed questions, we evaluate our approach on six continuous control Meta-RL benchmark environments based on Open AI Gym and the Mujoco simulator (Brockman et al., 2016; Todorov et al., 2012).
Dataset Splits No The paper uses continuous control Meta-RL benchmark environments (OpenAI Gym, Mujoco) where the concept of predefined train/validation/test dataset splits is not explicitly applicable as in supervised learning. It does not specify any fixed dataset splits for these environments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using
Experiment Setup Yes Table 1 contains the hyperparameter settings used for the different algorithms. Any environment specific modifications are noted in the respective paragraph describing the environment.