PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Authors: Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our PIVOT-R outperforms state-of-the-art (So TA) open-source models on the Sea Wave benchmark, achieving an average relative improvement of 19.45% across four levels of instruction tasks. Moreover, compared to the synchronously executed PIVOT-R, the execution efficiency of PIVOT-R with AHE is increased by 28-fold, with only a 2.9% drop in performance. These results provide compelling evidence that our PIVOT-R can significantly improve both the performance and efficiency of robotic manipulation. |
| Researcher Affiliation | Collaboration | Kaidong Zhang1 Pengzhen Ren2 Bingqian Lin1 Junfan Lin2 Shikui Ma3 Hang Xu4 Xiaodan Liang1,2 1Sun Yat-sen University 2Peng Cheng Laboratory 3Dataa Robotics 4Huawei Noah s Ark Lab |
| Pseudocode | No | The paper describes its architecture and processes using text and diagrams (Figure 1, Figure 2), but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | https://abliao.github.io/PIVOT-R and from the NeurIPS checklist: "Our work will be open source after acceptance." |
| Open Datasets | Yes | We choose Sea Wave [42], an open-source benchmark to learn multi-level instruction tasks, as our experimental platform, and use the corresponding data as demonstration data for imitation learning. ... The Sea Wave dataset contains a total of 13K data covering four different levels of language instructions. |
| Dataset Splits | No | The paper states 'We train on this dataset and test on a specially divided test set' but does not explicitly mention or detail a validation dataset split. |
| Hardware Specification | Yes | All experiments involved in this paper are conducted on a single GPU server with 6 NVIDIA RTX4090 GPUs. |
| Software Dependencies | No | The paper mentions software like LLAVA and CLIP, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The hyperparameter settings for PIVOT-R are shown in Table 6. LS 12, LA 3, Image encoder CLIP-Vi T-B/32, Text encoder CLIP-Vi T-B/32, Transformers heads 8, Embedded dims 512, Learning rate 3e-5, dropout 0.1. |