RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Authors: Qi Lv, Hao Li, Xiang Deng, Rui Shao, Michael Y Wang, Liqiang Nie

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of Robo MP2 on both VIMA benchmark and realworld tasks, with around 10% improvement over the baselines.
Researcher Affiliation Academia 1School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) 2School of Engineering, Great Bay University 3School of Computing and Information Technology, Great Bay University.
Pseudocode No The paper includes figures (Figure 6, Figure 7) that show templates and examples of code-like structures for the generator, but these are not formally labeled as 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper includes a link 'Robo MP2.github.io' on the first page. However, upon checking the linked website (https://robomp2.github.io/), it states 'Code will be released soon', indicating the code is not yet publicly available.
Open Datasets Yes We employ VIMA (Jiang et al., 2023) as the test benchmark which encompasses 17 tasks ranging from L1-level to L4-level difficulty.
Dataset Splits No The paper describes the VIMABench with L1-L4 levels of difficulty, but it does not specify explicit numerical percentages or counts for training, validation, and test splits within the main text or appendices.
Hardware Specification Yes The overall training time is around 24 hours on a 8*A100-80G-SXM4 platform.
Software Dependencies No The paper mentions software components like 'Vi T', 'flan-t5-xl', 'EVA-CLIP/g', 'GPT4/GPT3.5', 'GPT4V', and 'Adam W optimizer' but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup Yes We set the epoch to 10, the batch size to 128, the learning rates of the fusion module and Lo RA module to 3e-5 and 1e-4, respectively. We adopt the Adam W optimizer and the cosine decay learning schedule.