reproducibilityindex.ai

ProMP: Proximal Meta-Policy Search

Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show that Pro MP consistently outperforms previous Meta-RL algorithms in sample-efﬁciency, wall-clock time, and asymptotic performance.In order to empirically validate the theoretical arguments outlined above, this section provides a detailed experimental analysis that aims to answer the following questions: (i) How does Pro MP perform against previous Meta-RL algorithms? (ii) How do the lower variance but biased LVC gradient estimates compare to the high variance, unbiased Di CE estimates? (iii) Do the different formulations result in different pre-update exploration properties? (iv) How do formulation I and formulation II differ in their meta-gradient estimates and convergence properties?
Researcher Affiliation	Collaboration	Jonas Rothfuss UC Berkeley, KIT jonas.rothfuss@gmail.com; Dennis Lee , Ignasi Clavera UC Berkeley {dennisl88,iclavera}@berkeley.edu; Tamim Asfour Karlsruhe Inst. of Technology (KIT) asfour@kit.edu; Pieter Abbeel UC Berkeley, Covariant.ai pabbeel@cs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Proximal Meta-Policy Search (Pro MP)
Open Source Code	Yes	The source code and the experiment data are available on our supplementary website.2
Open Datasets	Yes	To answer the posed questions, we evaluate our approach on six continuous control Meta-RL benchmark environments based on Open AI Gym and the Mujoco simulator (Brockman et al., 2016; Todorov et al., 2012).
Dataset Splits	No	The paper uses continuous control Meta-RL benchmark environments (OpenAI Gym, Mujoco) where the concept of predefined train/validation/test dataset splits is not explicitly applicable as in supervised learning. It does not specify any fixed dataset splits for these environments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions using
Experiment Setup	Yes	Table 1 contains the hyperparameter settings used for the different algorithms. Any environment speciﬁc modiﬁcations are noted in the respective paragraph describing the environment.