Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
Authors: Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks. |
| Researcher Affiliation | Collaboration | Allan Zhou & Eric Jang Google Brain EMAIL Daniel Kappler & Alex Herzog X EMAIL Mohi Khansari, Paul Wohlart, Yunfei Bai & Mrinal Kalakrishnan X EMAIL Sergey Levine Google Brain, UC Berkeley EMAIL Chelsea Finn Google Brain, Stanford EMAIL |
| Pseudocode | Yes | Algorithm 1 Watch-Try-Learn: Meta-training |
| Open Source Code | Yes | We have published videos of our experimental results1 and the experiment model code2. https://github.com/google-research/tensor2robot/tree/master/research/ vrgripper |
| Open Datasets | No | The paper describes creating custom datasets for the gripper and reaching environments and collecting demonstrations, but does not provide specific access information (link, DOI, or citation to a public source) for these datasets. |
| Dataset Splits | Yes | We held out 40 tasks corresponding to 5 sets of kitchenware objects for our meta-validation dataset, which we used for hyperparameter selection. Similarly, we selected and held out 5 object sets of 40 tasks for our meta-test dataset, which we used for final evaluations. |
| Hardware Specification | Yes | We trained all policies using the ADAM optimizer (Kingma & Ba, 2015), on varying numbers of Nvidia Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions software like 'Bullet physics engine', 'TFAgents', and 'ADAM optimizer' but does not specify their version numbers. |
| Experiment Setup | Yes | We trained all policies for 50000 steps using a batch size of 100 tasks and a .001 learning rate, using a single GPU operating at 25 gradient steps per second. |