Unsupervised Curricula for Visual Meta-Reinforcement Learning

Authors: Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, Chelsea Finn

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment in visual navigation and visuomotor control domains to study the following questions: ... In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions.
Researcher Affiliation Academia Allan Jabriα Kyle Hsuβ, Benjamin Eysenbachγ Abhishek Guptaα Sergey Levineα Chelsea Finnδ αUC Berkeley βUniversity of Toronto γCarnegie Mellon University δStanford University
Pseudocode Yes Pseudocode for our method is presented in Algorithm 1. Algorithm 2: Task Acquisition via Discriminative Clustering
Open Source Code No Videos are available at the project website https://sites.google.com/view/carml. This link is for videos and does not explicitly state that source code for the methodology is provided.
Open Datasets Yes We experiment in visual navigation and visuomotor control domains... The first domain we consider is first-person visual navigation in Vi ZDoom [32]... We consider a simulated Sawyer arm interacting with an object in Mu Jo Co [60]...
Dataset Splits No The paper describes meta-training and testing on different task distributions (e.g., 'unsupervised meta-training' and 'test task distributions'), but does not specify explicit percentages or counts for training, validation, and test dataset splits in the conventional sense of supervised learning.
Hardware Specification No The paper does not specify any details about the hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions software components like 'RL2 algorithm [13]' and 'PPO [53] optimizer', but it does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup Yes Other experimental are explained in more detail in Appendix B. Meta-RL. CARML is agnostic to the meta-RL algorithm used in the M-step. We use the RL2 algorithm [13], which has previously been evaluated on simpler visual meta-RL domains, with a PPO [53] optimizer. Unless otherwise stated, we use four episodes per trial (compared to the two episodes per trial used in [13]), since the settings we consider involve more challenging task inference.