Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Agent Corridor Generating Algorithm

Authors: Arseni Pertzovskiy, Roni Stern, Roie Zivan, Ariel Felner

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate experimentally that MACGA and MACGA+PIBT outperform baseline algorithms in terms of success rate, runtime, and makespan across diverse MAPF benchmark grids. [...] Lastly, we conducted a large set of experiments on standard MAPF benchmarks [Stern et al., 2019] comparing MACGA and MACGA+PIBT to other standard and state-ofthe-art suboptimal algorithms, namely Pr P, PIBT, LNS2, La CAM, and La CAM . The results show that MACGA and MACGA+PIBT can generate outstanding results in terms of success rate in many MAPF benchmarks.
Researcher Affiliation	Academia	Arseniy Pertzovsky , Roni Stern , Roie Zivan and Ariel Felner Ben-Gurion University of the Negev EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Pseudocode The high-level pseudocode of MACGA (excluding the blue text) and MACGA+PIBT (including the blue text) is illustrated in Algorithm 1. [...] Algorithm 1 MACGA +PIBT
Open Source Code	No	The paper states: "All algorithms were implemented in Python" but does not provide any specific links or explicit statements about releasing their source code for the methodology described.
Open Datasets	Yes	All experiments were performed on six different maps from the MAPF benchmark [Stern et al., 2019]: empty-32-32, random-32-32-10, random-32-32-20, room-32-32-4, maze32-32-2, and maze-32-32-4 as they present different levels of difficulty. The maps are visualized in Figure 4.
Dataset Splits	No	The paper states, "We executed 20 random instances per every number of agents, map, and algorithm." This describes the experimental setup for evaluation, but not specific training/test/validation splits for a dataset in the machine learning sense.
Hardware Specification	Yes	All algorithms were implemented in Python and ran on a Mac Book Air with an Apple M1 chip and 8GB of RAM.
Software Dependencies	No	The paper states "All algorithms were implemented in Python" but does not provide a specific version number for Python or any other software libraries or solvers used in the implementation.
Experiment Setup	Yes	The number of agents used in our experiments varied from 100 to 700. We executed 20 random instances per every number of agents, map, and algorithm. A time limit of 30 seconds was imposed on every instance.