Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Authors: Qi Lv, Hao Li, Xiang Deng, Rui Shao, Michael Y Wang, Liqiang Nie
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of Robo MP2 on both VIMA benchmark and realworld tasks, with around 10% improvement over the baselines. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) 2School of Engineering, Great Bay University 3School of Computing and Information Technology, Great Bay University. |
| Pseudocode | No | The paper includes figures (Figure 6, Figure 7) that show templates and examples of code-like structures for the generator, but these are not formally labeled as 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper includes a link 'Robo MP2.github.io' on the first page. However, upon checking the linked website (https://robomp2.github.io/), it states 'Code will be released soon', indicating the code is not yet publicly available. |
| Open Datasets | Yes | We employ VIMA (Jiang et al., 2023) as the test benchmark which encompasses 17 tasks ranging from L1-level to L4-level difficulty. |
| Dataset Splits | No | The paper describes the VIMABench with L1-L4 levels of difficulty, but it does not specify explicit numerical percentages or counts for training, validation, and test splits within the main text or appendices. |
| Hardware Specification | Yes | The overall training time is around 24 hours on a 8*A100-80G-SXM4 platform. |
| Software Dependencies | No | The paper mentions software components like 'Vi T', 'flan-t5-xl', 'EVA-CLIP/g', 'GPT4/GPT3.5', 'GPT4V', and 'Adam W optimizer' but does not provide specific version numbers for these software dependencies or libraries. |
| Experiment Setup | Yes | We set the epoch to 10, the batch size to 128, the learning rates of the fusion module and Lo RA module to 3e-5 and 1e-4, respectively. We adopt the Adam W optimizer and the cosine decay learning schedule. |