reproducibilityindex.ai

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Authors: Qi Lv, Hao Li, Xiang Deng, Rui Shao, Michael Y Wang, Liqiang Nie

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of Robo MP2 on both VIMA benchmark and realworld tasks, with around 10% improvement over the baselines.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) 2School of Engineering, Great Bay University 3School of Computing and Information Technology, Great Bay University.
Pseudocode	No	The paper includes figures (Figure 6, Figure 7) that show templates and examples of code-like structures for the generator, but these are not formally labeled as 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper includes a link 'Robo MP2.github.io' on the first page. However, upon checking the linked website (https://robomp2.github.io/), it states 'Code will be released soon', indicating the code is not yet publicly available.
Open Datasets	Yes	We employ VIMA (Jiang et al., 2023) as the test benchmark which encompasses 17 tasks ranging from L1-level to L4-level difficulty.
Dataset Splits	No	The paper describes the VIMABench with L1-L4 levels of difficulty, but it does not specify explicit numerical percentages or counts for training, validation, and test splits within the main text or appendices.
Hardware Specification	Yes	The overall training time is around 24 hours on a 8*A100-80G-SXM4 platform.
Software Dependencies	No	The paper mentions software components like 'Vi T', 'flan-t5-xl', 'EVA-CLIP/g', 'GPT4/GPT3.5', 'GPT4V', and 'Adam W optimizer' but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup	Yes	We set the epoch to 10, the batch size to 128, the learning rates of the fusion module and Lo RA module to 3e-5 and 1e-4, respectively. We adopt the Adam W optimizer and the cosine decay learning schedule.