Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards

Authors: Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks.
Researcher Affiliation Collaboration Allan Zhou & Eric Jang Google Brain EMAIL Daniel Kappler & Alex Herzog X EMAIL Mohi Khansari, Paul Wohlart, Yunfei Bai & Mrinal Kalakrishnan X EMAIL Sergey Levine Google Brain, UC Berkeley EMAIL Chelsea Finn Google Brain, Stanford EMAIL
Pseudocode Yes Algorithm 1 Watch-Try-Learn: Meta-training
Open Source Code Yes We have published videos of our experimental results1 and the experiment model code2. https://github.com/google-research/tensor2robot/tree/master/research/ vrgripper
Open Datasets No The paper describes creating custom datasets for the gripper and reaching environments and collecting demonstrations, but does not provide specific access information (link, DOI, or citation to a public source) for these datasets.
Dataset Splits Yes We held out 40 tasks corresponding to 5 sets of kitchenware objects for our meta-validation dataset, which we used for hyperparameter selection. Similarly, we selected and held out 5 object sets of 40 tasks for our meta-test dataset, which we used for final evaluations.
Hardware Specification Yes We trained all policies using the ADAM optimizer (Kingma & Ba, 2015), on varying numbers of Nvidia Tesla P100 GPUs.
Software Dependencies No The paper mentions software like 'Bullet physics engine', 'TFAgents', and 'ADAM optimizer' but does not specify their version numbers.
Experiment Setup Yes We trained all policies for 50000 steps using a batch size of 100 tasks and a .001 learning rate, using a single GPU operating at 25 gradient steps per second.