Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MA-LAMA: Exploiting the Multi-Agent Nature of Temporal Planning Problems
Authors: J. Caballero Testรณn, Maria D. R-Moreno
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 7, we conduct a two-part empirical evaluation of MA-LAMA. First, we evaluate MA-LAMA s coverage and plan quality performance in different domains. Subsequently, we analyze its search efficiency by comparing runtime with the quality of the generated plans in complex scenarios. |
| Researcher Affiliation | Academia | J. CABALLERO TESTรN , Universidad de Alcalรก, ISG, EPS, Spain MARIA D. R-MORENO, Universidad de Alcalรก, ISG, EPS, Spain and TNO, IAS, The Netherlands |
| Pseudocode | Yes | Algorithm 1 Agent Decomposition (AD) Input: MAP temporal task, ๐๐๐๐ ๐, ๐ผ,๐บ,๐, ๐, ๐ , where ๐is the multi-valued variables set, ๐ผis the initial state, ๐บis the goals set, ๐is the operators set, ๐is the multi-valued numeric variables set and ๐is the metric. |
| Open Source Code | No | No explicit statement about code release or repository link found. |
| Open Datasets | Yes | We test the performance of MA-LAMA through a MA temporal domains test bench, which comprises the MA temporal Exploration domain and all IPC temporal domains [12] that present MA features. ... [12] A. Coles, C. Coles, M. Martinez, and P. Sidiropoulos. 2018. International planning competition 2018 temporal tracks. https://ipc2018-te mporal.bitbucket.io/. Accessed: 2022-09-01. (2018). |
| Dataset Splits | No | The paper refers to planning problems from IPC temporal domains as test cases (e.g., 'Taxis (20)', 'Rovers (20)') but does not specify training/test/validation splits in the traditional machine learning sense for these problem sets. The entire set of problems is used for evaluation. |
| Hardware Specification | No | All executions are limited to 10 minutes and 4GB of RAM. |
| Software Dependencies | No | The paper mentions several planners and languages used (e.g., LAMA [45], PDDL2.1 [20]), but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | All executions are limited to 10 minutes and 4GB of RAM. ... The metric used is a weighted combination of battery usage and mission risk, similar to the one defined in Figure 3. ... The metric aims to minimize the overall sum, favoring UAV operations by assigning them lower penalty values compared to ROVER actions. |