RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
Authors: Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, Priya Sundaresan, Peng Xu, Hao Su, Karol Hausman, Chelsea Finn, Quan Vuong, Ted Xiao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate RT-Trajectory at scale on a variety of real-world robotic tasks, and find that RT-Trajectory is able to perform a wider range of tasks compared to languageconditioned and goal-conditioned policies, when provided the same training data. Our real robot experiments aim to study the following questions: |
| Researcher Affiliation | Collaboration | 1Google Deep Mind, 2University of California San Diego, 3Stanford University, 4Intrinsic |
| Pseudocode | No | The paper describes procedures in text but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Evaluation videos can be found at https://rt-trajectory.github.io/. No explicit statement about providing open-source code for the methodology was found. |
| Open Datasets | Yes | We use the RT-1 (Brohan et al., 2023b) demonstration dataset for training. |
| Dataset Splits | No | The paper refers to a training dataset and unseen skills for evaluation but does not provide specific percentages or counts for training, validation, and test splits, nor does it refer to predefined standard splits for reproducibility. |
| Hardware Specification | No | The paper describes the robot hardware ("mobile manipulator robot from Everyday Robots", "7 degree-of-freedom arm", "two-fingered gripper", "mobile base") but does not specify the computational hardware (e.g., GPU models, CPU types) used for training or inference of the models. |
| Software Dependencies | No | The paper mentions "Mediapipe" and "Open AI" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We then learn a policy π represented by a Transformer (Vaswani et al., 2017) using Behavior Cloning (Pomerleau, 1988) following the RT-1 framework (Brohan et al., 2023b), by minimizing the log-likelihood of predicted actions at given the input image and trajectory sketch. To support trajectory conditioning, we modify the RT-1 architecture as follows. The trajectory sketch is concatenated with each RGB image along the feature dimension in the input sequence (a history of 6 images), which is processed by the image tokenizer (an Image Net pretrained Efficient Net-B3). For the additional input channels to the image tokenizer, we initialize the new weights in the first convolution layer with all zeros. Since the language instruction is not used, we remove the Fi LM layers used in the original RT-1. |