Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards

Authors: Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks.
Researcher Affiliation Collaboration Allan Zhou & Eric Jang Google Brain {allanz,ejang}@google.com Daniel Kappler & Alex Herzog X {kappler,alexherzog}@x.team Mohi Khansari, Paul Wohlart, Yunfei Bai & Mrinal Kalakrishnan X {khansari,wohlhart,yunfeibai,kalakris}@x.team Sergey Levine Google Brain, UC Berkeley slevine@google.com Chelsea Finn Google Brain, Stanford chelseaf@google.com
Pseudocode Yes Algorithm 1 Watch-Try-Learn: Meta-training
Open Source Code Yes We have published videos of our experimental results1 and the experiment model code2. https://github.com/google-research/tensor2robot/tree/master/research/ vrgripper
Open Datasets No The paper describes creating custom datasets for the gripper and reaching environments and collecting demonstrations, but does not provide specific access information (link, DOI, or citation to a public source) for these datasets.
Dataset Splits Yes We held out 40 tasks corresponding to 5 sets of kitchenware objects for our meta-validation dataset, which we used for hyperparameter selection. Similarly, we selected and held out 5 object sets of 40 tasks for our meta-test dataset, which we used for final evaluations.
Hardware Specification Yes We trained all policies using the ADAM optimizer (Kingma & Ba, 2015), on varying numbers of Nvidia Tesla P100 GPUs.
Software Dependencies No The paper mentions software like 'Bullet physics engine', 'TFAgents', and 'ADAM optimizer' but does not specify their version numbers.
Experiment Setup Yes We trained all policies for 50000 steps using a batch size of 100 tasks and a .001 learning rate, using a single GPU operating at 25 gradient steps per second.