reproducibilityindex.ai

Unsupervised Curricula for Visual Meta-Reinforcement Learning

Authors: Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, Chelsea Finn

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment in visual navigation and visuomotor control domains to study the following questions: ... In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks speciﬁed by hand-crafted reward functions and serves as pre-training for more efﬁcient supervised meta-learning of test task distributions.
Researcher Affiliation	Academia	Allan Jabriα Kyle Hsuβ, Benjamin Eysenbachγ Abhishek Guptaα Sergey Levineα Chelsea Finnδ αUC Berkeley βUniversity of Toronto γCarnegie Mellon University δStanford University
Pseudocode	Yes	Pseudocode for our method is presented in Algorithm 1. Algorithm 2: Task Acquisition via Discriminative Clustering
Open Source Code	No	Videos are available at the project website https://sites.google.com/view/carml. This link is for videos and does not explicitly state that source code for the methodology is provided.
Open Datasets	Yes	We experiment in visual navigation and visuomotor control domains... The ﬁrst domain we consider is ﬁrst-person visual navigation in Vi ZDoom [32]... We consider a simulated Sawyer arm interacting with an object in Mu Jo Co [60]...
Dataset Splits	No	The paper describes meta-training and testing on different task distributions (e.g., 'unsupervised meta-training' and 'test task distributions'), but does not specify explicit percentages or counts for training, validation, and test dataset splits in the conventional sense of supervised learning.
Hardware Specification	No	The paper does not specify any details about the hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software components like 'RL2 algorithm [13]' and 'PPO [53] optimizer', but it does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup	Yes	Other experimental are explained in more detail in Appendix B. Meta-RL. CARML is agnostic to the meta-RL algorithm used in the M-step. We use the RL2 algorithm [13], which has previously been evaluated on simpler visual meta-RL domains, with a PPO [53] optimizer. Unless otherwise stated, we use four episodes per trial (compared to the two episodes per trial used in [13]), since the settings we consider involve more challenging task inference.