Know Thyself: Transferable Visual Control Policies Through Robot-Awareness

Authors: Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on tabletop manipulation tasks with simulated and real robots demonstrate that these plug-in improvements dramatically boost the transferability of visual model-based RL policies, even permitting zero-shot transfer of visual manipulation skills onto new robots.
Researcher Affiliation Academia GRASP Lab, University of Pennsylvania {hued, huangkun, oleh, dineshj}@seas.upenn.edu
Pseudocode Yes Algorithm 1 TRAIN, Algorithm 2 TEST, Algorithm 3 Pytorch code for the Lw loss.
Open Source Code Yes To ensure reproducibility, we will release the codebase that contains our video prediction and control algorithms, as well as weights for our trained models. and See the website for the code https://www.seas.upenn.edu/~hued/rac
Open Datasets Yes Models were pretrained for 150,000 gradient steps on the Robo Net dataset with the Adam optimizer... and For datasets, we will release our subset of Robo Net labeled with robotic masks, and our Widow X video dataset.
Dataset Splits No The paper describes training and evaluation procedures, including details about sequences and evaluation metrics, but it does not explicitly provide percentages or counts for training, validation, and test dataset splits for its experiments. For instance, it mentions 'train on 10k trajectories' and 'evaluate on 1000 trajectories' but these are dataset sizes rather than formal train/validation/test splits.
Hardware Specification No The paper does not specify the computational hardware (e.g., specific GPU or CPU models, memory) used for running the experiments or training the models.
Software Dependencies No The paper mentions software like 'Py Robot', 'Move It! motion planning ROS package', 'Mu Jo Co', 'Label Studio', and 'Py Torch', but does not provide specific version numbers for any of them.
Experiment Setup Yes Models were pretrained for 150,000 gradient steps on the Robo Net dataset with the Adam optimizer, learning rate 3e 4 and batch size of 16. ... For fine-tuning, all models were trained on the fine-tune dataset for 10,000 gradient steps with learning rate of 1e 4, batch size of 10... and For the few-shot Widow X200 experiment, the CEM action selection generates 300 action trajectories of length 5, selects the top 10 sequences, and optimizes the distribution for 10 iterations. For the zero-shot Franka experiment, the CEM action selection generates 100 action trajectories of length 5, selects the top 10 sequences, and optimizes the distribution for 3 iterations.