Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MA-LAMA: Exploiting the Multi-Agent Nature of Temporal Planning Problems

Authors: J. Caballero Testón, Maria D. R-Moreno

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 7, we conduct a two-part empirical evaluation of MA-LAMA. First, we evaluate MA-LAMA s coverage and plan quality performance in different domains. Subsequently, we analyze its search efficiency by comparing runtime with the quality of the generated plans in complex scenarios.
Researcher Affiliation	Academia	J. CABALLERO TESTÓN , Universidad de Alcalá, ISG, EPS, Spain MARIA D. R-MORENO, Universidad de Alcalá, ISG, EPS, Spain and TNO, IAS, The Netherlands
Pseudocode	Yes	Algorithm 1 Agent Decomposition (AD) Input: MAP temporal task, 𝑒𝑀𝑃𝑇 𝑉, 𝐼,𝐺,𝑂, 𝑁, 𝑀 , where 𝑉is the multi-valued variables set, 𝐼is the initial state, 𝐺is the goals set, 𝑂is the operators set, 𝑁is the multi-valued numeric variables set and 𝑀is the metric.
Open Source Code	No	No explicit statement about code release or repository link found.
Open Datasets	Yes	We test the performance of MA-LAMA through a MA temporal domains test bench, which comprises the MA temporal Exploration domain and all IPC temporal domains [12] that present MA features. ... [12] A. Coles, C. Coles, M. Martinez, and P. Sidiropoulos. 2018. International planning competition 2018 temporal tracks. https://ipc2018-te mporal.bitbucket.io/. Accessed: 2022-09-01. (2018).
Dataset Splits	No	The paper refers to planning problems from IPC temporal domains as test cases (e.g., 'Taxis (20)', 'Rovers (20)') but does not specify training/test/validation splits in the traditional machine learning sense for these problem sets. The entire set of problems is used for evaluation.
Hardware Specification	No	All executions are limited to 10 minutes and 4GB of RAM.
Software Dependencies	No	The paper mentions several planners and languages used (e.g., LAMA [45], PDDL2.1 [20]), but does not provide specific version numbers for any of them.
Experiment Setup	Yes	All executions are limited to 10 minutes and 4GB of RAM. ... The metric used is a weighted combination of battery usage and mission risk, similar to the one defined in Figure 3. ... The metric aims to minimize the overall sum, favoring UAV operations by assigning them lower penalty values compared to ROVER actions.