Unsupervised Curricula for Visual Meta-Reinforcement Learning
Authors: Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, Chelsea Finn
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment in visual navigation and visuomotor control domains to study the following questions: ... In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions. |
| Researcher Affiliation | Academia | Allan Jabriα Kyle Hsuβ, Benjamin Eysenbachγ Abhishek Guptaα Sergey Levineα Chelsea Finnδ αUC Berkeley βUniversity of Toronto γCarnegie Mellon University δStanford University |
| Pseudocode | Yes | Pseudocode for our method is presented in Algorithm 1. Algorithm 2: Task Acquisition via Discriminative Clustering |
| Open Source Code | No | Videos are available at the project website https://sites.google.com/view/carml. This link is for videos and does not explicitly state that source code for the methodology is provided. |
| Open Datasets | Yes | We experiment in visual navigation and visuomotor control domains... The first domain we consider is first-person visual navigation in Vi ZDoom [32]... We consider a simulated Sawyer arm interacting with an object in Mu Jo Co [60]... |
| Dataset Splits | No | The paper describes meta-training and testing on different task distributions (e.g., 'unsupervised meta-training' and 'test task distributions'), but does not specify explicit percentages or counts for training, validation, and test dataset splits in the conventional sense of supervised learning. |
| Hardware Specification | No | The paper does not specify any details about the hardware used for running the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions software components like 'RL2 algorithm [13]' and 'PPO [53] optimizer', but it does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | Other experimental are explained in more detail in Appendix B. Meta-RL. CARML is agnostic to the meta-RL algorithm used in the M-step. We use the RL2 algorithm [13], which has previously been evaluated on simpler visual meta-RL domains, with a PPO [53] optimizer. Unless otherwise stated, we use four episodes per trial (compared to the two episodes per trial used in [13]), since the settings we consider involve more challenging task inference. |