reproducibilityindex.ai

Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards

Authors: Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our method signiﬁcantly outperforms prior approaches on a set of challenging, vision-based control tasks.
Researcher Affiliation	Collaboration	Allan Zhou & Eric Jang Google Brain {allanz,ejang}@google.com Daniel Kappler & Alex Herzog X {kappler,alexherzog}@x.team Mohi Khansari, Paul Wohlart, Yunfei Bai & Mrinal Kalakrishnan X {khansari,wohlhart,yunfeibai,kalakris}@x.team Sergey Levine Google Brain, UC Berkeley slevine@google.com Chelsea Finn Google Brain, Stanford chelseaf@google.com
Pseudocode	Yes	Algorithm 1 Watch-Try-Learn: Meta-training
Open Source Code	Yes	We have published videos of our experimental results1 and the experiment model code2. https://github.com/google-research/tensor2robot/tree/master/research/ vrgripper
Open Datasets	No	The paper describes creating custom datasets for the gripper and reaching environments and collecting demonstrations, but does not provide specific access information (link, DOI, or citation to a public source) for these datasets.
Dataset Splits	Yes	We held out 40 tasks corresponding to 5 sets of kitchenware objects for our meta-validation dataset, which we used for hyperparameter selection. Similarly, we selected and held out 5 object sets of 40 tasks for our meta-test dataset, which we used for ﬁnal evaluations.
Hardware Specification	Yes	We trained all policies using the ADAM optimizer (Kingma & Ba, 2015), on varying numbers of Nvidia Tesla P100 GPUs.
Software Dependencies	No	The paper mentions software like 'Bullet physics engine', 'TFAgents', and 'ADAM optimizer' but does not specify their version numbers.
Experiment Setup	Yes	We trained all policies for 50000 steps using a batch size of 100 tasks and a .001 learning rate, using a single GPU operating at 25 gradient steps per second.