Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

Authors: Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.
Researcher Affiliation	Academia	1University of Southern California 2University of California, Irvine EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: ADDRESS Destroy Heuristic Algorithm 2: MAPF-LNS with our ADDRESS Heuristic
Open Source Code	Yes	Code https://github.com/Jimy Z13/ADDRESS
Open Datasets	Yes	We evaluate ADDRESS on five maps from the MAPF benchmark set of (Stern et al. 2019), namely (1) a Random map (Random-32-32-20), (2) two Game maps Ost003d and (3) Den520d, (4) a Warehouse map (Warehouse-20-40-10-2-2), and (5) a City map (Paris 1 256). All maps have different sizes and structures. We conduct all experiments on the publicly available 25 random scenarios per map.
Dataset Splits	No	We conduct all experiments on the publicly available 25 random scenarios per map. This text describes the instances used for evaluation but does not specify a training/test/validation split of a dataset for model training or evaluation. The paper focuses on an algorithmic solution rather than a machine learning model that would typically require such splits.
Hardware Specification	Yes	All experiments were run on a high-performance computing cluster with Cent OS Linux, Intel Xeon 2640v4 CPUs, and 64 GB RAM.
Software Dependencies	No	Our implementation is based on the public code of (Li et al. 2022; Phan et al. 2024b). This indicates reliance on other codebases but does not specify versions of programming languages, libraries, or solvers used in the current implementation.
Experiment Setup	Yes	We set the neighborhood size N = 8 (except for BALANCE, which automatically adapts N), K = 32, and use Thompson Sampling for ADDRESS and BALANCE, unless stated otherwise. ϵ-Greedy is used with ϵ = 1/2.