Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unsupervised Curricula for Visual Meta-Reinforcement Learning
Authors: Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, Chelsea Finn
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment in visual navigation and visuomotor control domains to study the following questions: ... In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions. |
| Researcher Affiliation | Academia | Allan Jabriα Kyle Hsuβ, Benjamin Eysenbachγ Abhishek Guptaα Sergey Levineα Chelsea Finnδ αUC Berkeley βUniversity of Toronto γCarnegie Mellon University δStanford University |
| Pseudocode | Yes | Pseudocode for our method is presented in Algorithm 1. Algorithm 2: Task Acquisition via Discriminative Clustering |
| Open Source Code | No | Videos are available at the project website https://sites.google.com/view/carml. This link is for videos and does not explicitly state that source code for the methodology is provided. |
| Open Datasets | Yes | We experiment in visual navigation and visuomotor control domains... The first domain we consider is first-person visual navigation in Vi ZDoom [32]... We consider a simulated Sawyer arm interacting with an object in Mu Jo Co [60]... |
| Dataset Splits | No | The paper describes meta-training and testing on different task distributions (e.g., 'unsupervised meta-training' and 'test task distributions'), but does not specify explicit percentages or counts for training, validation, and test dataset splits in the conventional sense of supervised learning. |
| Hardware Specification | No | The paper does not specify any details about the hardware used for running the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions software components like 'RL2 algorithm [13]' and 'PPO [53] optimizer', but it does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | Other experimental are explained in more detail in Appendix B. Meta-RL. CARML is agnostic to the meta-RL algorithm used in the M-step. We use the RL2 algorithm [13], which has previously been evaluated on simpler visual meta-RL domains, with a PPO [53] optimizer. Unless otherwise stated, we use four episodes per trial (compared to the two episodes per trial used in [13]), since the settings we consider involve more challenging task inference. |