Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ProMP: Proximal Meta-Policy Search

Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that Pro MP consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.In order to empirically validate the theoretical arguments outlined above, this section provides a detailed experimental analysis that aims to answer the following questions: (i) How does Pro MP perform against previous Meta-RL algorithms? (ii) How do the lower variance but biased LVC gradient estimates compare to the high variance, unbiased Di CE estimates? (iii) Do the different formulations result in different pre-update exploration properties? (iv) How do formulation I and formulation II differ in their meta-gradient estimates and convergence properties?
Researcher Affiliation Collaboration Jonas Rothfuss UC Berkeley, KIT EMAIL; Dennis Lee , Ignasi Clavera UC Berkeley EMAIL; Tamim Asfour Karlsruhe Inst. of Technology (KIT) EMAIL; Pieter Abbeel UC Berkeley, Covariant.ai EMAIL
Pseudocode Yes Algorithm 1 Proximal Meta-Policy Search (Pro MP)
Open Source Code Yes The source code and the experiment data are available on our supplementary website.2
Open Datasets Yes To answer the posed questions, we evaluate our approach on six continuous control Meta-RL benchmark environments based on Open AI Gym and the Mujoco simulator (Brockman et al., 2016; Todorov et al., 2012).
Dataset Splits No The paper uses continuous control Meta-RL benchmark environments (OpenAI Gym, Mujoco) where the concept of predefined train/validation/test dataset splits is not explicitly applicable as in supervised learning. It does not specify any fixed dataset splits for these environments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using
Experiment Setup Yes Table 1 contains the hyperparameter settings used for the different algorithms. Any environment specific modifications are noted in the respective paragraph describing the environment.