Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
Authors: Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks. |
| Researcher Affiliation | Collaboration | Allan Zhou & Eric Jang Google Brain {allanz,ejang}@google.com Daniel Kappler & Alex Herzog X {kappler,alexherzog}@x.team Mohi Khansari, Paul Wohlart, Yunfei Bai & Mrinal Kalakrishnan X {khansari,wohlhart,yunfeibai,kalakris}@x.team Sergey Levine Google Brain, UC Berkeley slevine@google.com Chelsea Finn Google Brain, Stanford chelseaf@google.com |
| Pseudocode | Yes | Algorithm 1 Watch-Try-Learn: Meta-training |
| Open Source Code | Yes | We have published videos of our experimental results1 and the experiment model code2. https://github.com/google-research/tensor2robot/tree/master/research/ vrgripper |
| Open Datasets | No | The paper describes creating custom datasets for the gripper and reaching environments and collecting demonstrations, but does not provide specific access information (link, DOI, or citation to a public source) for these datasets. |
| Dataset Splits | Yes | We held out 40 tasks corresponding to 5 sets of kitchenware objects for our meta-validation dataset, which we used for hyperparameter selection. Similarly, we selected and held out 5 object sets of 40 tasks for our meta-test dataset, which we used for final evaluations. |
| Hardware Specification | Yes | We trained all policies using the ADAM optimizer (Kingma & Ba, 2015), on varying numbers of Nvidia Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions software like 'Bullet physics engine', 'TFAgents', and 'ADAM optimizer' but does not specify their version numbers. |
| Experiment Setup | Yes | We trained all policies for 50000 steps using a batch size of 100 tasks and a .001 learning rate, using a single GPU operating at 25 gradient steps per second. |