Meta-Reinforcement Learning of Structured Exploration Strategies
Authors: Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation. |
| Researcher Affiliation | Academia | Abhishek Gupta, Russell Mendonca, Yu Xuan Liu, Pieter Abbeel, Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley {abhigupta, pabbeel, svlevine}@eecs.berkeley.edu {russellm, yuxuanliu}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 MAESN meta-RL algorithm |
| Open Source Code | No | Videos and experimental details for all our experiments can be found at https://sites.google.com/view/meta-explore/. This statement does not explicitly confirm the release of the methodology's source code. |
| Open Datasets | No | The paper describes custom 'simulated tasks' and 'task distributions' but does not provide concrete access information (link, DOI, specific repository, or formal citation with author/year for a public dataset) for them. |
| Dataset Splits | No | Rewards are averaged over 100 validation tasks, which have sparse rewards as described in supplementary material. (Figure 3 caption). And averaged across 30 validation tasks (Section 4.3). This mentions validation tasks but not dataset splits in the context of a dataset. |
| Hardware Specification | No | All experiments were initially run on a local 2 GPU machine, and run at scale using Amazon Web Services. This does not provide specific hardware models or detailed specifications. |
| Software Dependencies | No | The paper mentions 'trust region policy optimization(TRPO) [24]' and other algorithms but does not provide specific software names with version numbers or library dependencies. |
| Experiment Setup | No | Hyperparameters of each algorithm are mentioned in the supplementary materials, which were selected via a hyperparameter sweep (also detailed in the appendix). This statement indicates that the details are not in the main text. |