Meta-Reinforcement Learning of Structured Exploration Strategies

Authors: Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.
Researcher Affiliation Academia Abhishek Gupta, Russell Mendonca, Yu Xuan Liu, Pieter Abbeel, Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley {abhigupta, pabbeel, svlevine}@eecs.berkeley.edu {russellm, yuxuanliu}@berkeley.edu
Pseudocode Yes Algorithm 1 MAESN meta-RL algorithm
Open Source Code No Videos and experimental details for all our experiments can be found at https://sites.google.com/view/meta-explore/. This statement does not explicitly confirm the release of the methodology's source code.
Open Datasets No The paper describes custom 'simulated tasks' and 'task distributions' but does not provide concrete access information (link, DOI, specific repository, or formal citation with author/year for a public dataset) for them.
Dataset Splits No Rewards are averaged over 100 validation tasks, which have sparse rewards as described in supplementary material. (Figure 3 caption). And averaged across 30 validation tasks (Section 4.3). This mentions validation tasks but not dataset splits in the context of a dataset.
Hardware Specification No All experiments were initially run on a local 2 GPU machine, and run at scale using Amazon Web Services. This does not provide specific hardware models or detailed specifications.
Software Dependencies No The paper mentions 'trust region policy optimization(TRPO) [24]' and other algorithms but does not provide specific software names with version numbers or library dependencies.
Experiment Setup No Hyperparameters of each algorithm are mentioned in the supplementary materials, which were selected via a hyperparameter sweep (also detailed in the appendix). This statement indicates that the details are not in the main text.