Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
Authors: Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks. |
| Researcher Affiliation | Collaboration | Allan Zhou & Eric Jang Google Brain EMAIL Daniel Kappler & Alex Herzog X EMAIL Mohi Khansari, Paul Wohlart, Yunfei Bai & Mrinal Kalakrishnan X EMAIL Sergey Levine Google Brain, UC Berkeley EMAIL Chelsea Finn Google Brain, Stanford EMAIL |
| Pseudocode | Yes | Algorithm 1 Watch-Try-Learn: Meta-training |
| Open Source Code | Yes | We have published videos of our experimental results1 and the experiment model code2. https://github.com/google-research/tensor2robot/tree/master/research/ vrgripper |
| Open Datasets | No | The paper describes creating custom datasets for the gripper and reaching environments and collecting demonstrations, but does not provide specific access information (link, DOI, or citation to a public source) for these datasets. |
| Dataset Splits | Yes | We held out 40 tasks corresponding to 5 sets of kitchenware objects for our meta-validation dataset, which we used for hyperparameter selection. Similarly, we selected and held out 5 object sets of 40 tasks for our meta-test dataset, which we used for final evaluations. |
| Hardware Specification | Yes | We trained all policies using the ADAM optimizer (Kingma & Ba, 2015), on varying numbers of Nvidia Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions software like 'Bullet physics engine', 'TFAgents', and 'ADAM optimizer' but does not specify their version numbers. |
| Experiment Setup | Yes | We trained all policies for 50000 steps using a batch size of 100 tasks and a .001 learning rate, using a single GPU operating at 25 gradient steps per second. |