Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration

Authors: Yishuai Cai, Xinglin Chen, Zhongxuan Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Ji Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm in warehouse management and everyday service scenarios. Results demonstrate MRBTP s robustness and execution efficiency under varying settings, as well as the ability of the pre-trained LLM to generate effective task-specific subtrees for MRBTP. Experiments in warehouse management and everyday service scenarios demonstrate MRBTP s robustness and execution efficiency under varying settings, as well as the ability of pre-trained LLMs to generate effective task-specific subtrees for MRBTP.
Researcher Affiliation	Academia	College of Computer Science and Technology, National University of Defense Technology EMAIL
Pseudocode	Yes	Algorithm 1: One-step cross-tree expansion Algorithm 2: MRBTP
Open Source Code	Yes	Code https://github.com/DIDS-EI/MRBTP
Open Datasets	Yes	Scenarios (a) Warehouse Management. We extend the Minigrid (Chevalier-Boisvert et al. 2023) environment for multi-robot simulations with 4-8 robots in 4 rooms containing randomly placed packages. (b) Home Service. In the Virtual Home (Puig et al. 2018) environment, 2-4 robots interact with dozens of objects and perform hundreds of potential actions.
Dataset Splits	No	The paper mentions generating "randomly generated solvable multi-robot BT planning problems" and "a dataset of 75 instances across three levels of homogeneity" but does not provide specific train/test/validation splits for these instances or any other datasets used.
Hardware Specification	Yes	All experiments were conducted on a system equipped with an AMD Ryzen 9 5900X 12-core processor with a 3.70 GHz base clock and 128 GB of DDR4 RAM.
Software Dependencies	Yes	The model used to generate subtrees is gpt-4o-mini-2024-07-18 (Open AI 2023). We tested different versions of large language models, including gpt3.5-turbo (2024.12) and gpt-4o-2024-08-06 (Open AI 2023), for assisting in subtree pre-planning.
Experiment Setup	Yes	Settings (a) Homogeneity (α): The proportion of redundant actions assigned to robots, where α = 1 denotes complete heterogeneity (no overlap in action spaces) and α = 0 denotes complete homogeneity (identical action spaces). (b) Action Failure Probability (FP): The probability that a robot fails to execute an action. (c) Subtree Intention Sharing (Subtree IS) and Atomic Action Nodes Intention Sharing (Atomic IS): These terms refer to the application of Intention Sharing either among subtrees or at the level of individual atomic action nodes. (d) Feedback (F) and No Feedback (NF): This setting distinguishes between LLMs that use feedback during subtree generation and those that do not. In the Feedback condition, the LLM receives up to 3 feedback iterations, while in the No Feedback condition, no feedback is provided. Table 3 shows that subtree pre-planning significantly reduces BTs planning time under a 60-second constraint by minimizing redundancy through subtree reuse and similar robot action spaces.