RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Authors: Qi Lv, Hao Li, Xiang Deng, Rui Shao, Michael Y Wang, Liqiang Nie
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of Robo MP2 on both VIMA benchmark and realworld tasks, with around 10% improvement over the baselines. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) 2School of Engineering, Great Bay University 3School of Computing and Information Technology, Great Bay University. |
| Pseudocode | No | The paper includes figures (Figure 6, Figure 7) that show templates and examples of code-like structures for the generator, but these are not formally labeled as 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper includes a link 'Robo MP2.github.io' on the first page. However, upon checking the linked website (https://robomp2.github.io/), it states 'Code will be released soon', indicating the code is not yet publicly available. |
| Open Datasets | Yes | We employ VIMA (Jiang et al., 2023) as the test benchmark which encompasses 17 tasks ranging from L1-level to L4-level difficulty. |
| Dataset Splits | No | The paper describes the VIMABench with L1-L4 levels of difficulty, but it does not specify explicit numerical percentages or counts for training, validation, and test splits within the main text or appendices. |
| Hardware Specification | Yes | The overall training time is around 24 hours on a 8*A100-80G-SXM4 platform. |
| Software Dependencies | No | The paper mentions software components like 'Vi T', 'flan-t5-xl', 'EVA-CLIP/g', 'GPT4/GPT3.5', 'GPT4V', and 'Adam W optimizer' but does not provide specific version numbers for these software dependencies or libraries. |
| Experiment Setup | Yes | We set the epoch to 10, the batch size to 128, the learning rates of the fusion module and Lo RA module to 3e-5 and 1e-4, respectively. We adopt the Adam W optimizer and the cosine decay learning schedule. |