Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Approach to Quantify Plans Robustness in Real-world Applications

Authors: Francesco Percassi, Sandra Castellanos-Paez, Romain Rombourg, Mauro Vallati

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach in two real-world domains, demonstrating its effectiveness." and "We demonstrate the applicability of our framework on realistic benchmarks, showcasing its effectiveness in evaluating plan robustness." and Section "7 Experimental Evaluation" which describes applying the proposed framework to two case studies: Urban Traffic Control and Baxter.
Researcher Affiliation Academia 1School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom 2G2ELab, Grenoble INP, CNRS, Université Grenoble Alpes, Grenoble, France EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes its methodology using mathematical formulations and descriptive text, without presenting any structured pseudocode or algorithm blocks.
Open Source Code Yes An example of how to set up these computations is available at https://gitlab.com/Edmond Dantes/robustnessijcai.
Open Datasets No The paper describes using 'historical data' for the UTC domain and generating a 'population of 1000 instances' for the Baxter domain, but it does not provide concrete access information (e.g., links, DOIs, specific repositories, or formal citations for the specific datasets used in their experiments).
Dataset Splits No The paper refers to a 'sample population of size 1000' and '90 instances derived from historical data' for evaluating plan robustness, and '10 plans for the generated population of 1000 instances' but does not specify training/test/validation dataset splits in the context of model development or evaluation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'ENHSP' for plan generation and 'PPS tool' for plan validation, but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We set the significance level to α = 0.05, while the distribution over I and the number of trials N are case-specific. For (ii) and (iii), we propose two domain-specific distance metrics. All plans are generated using ENHSP with Greedy Best-First Search and either hmax or hadd heuristics [Scala et al., 2016]." and "For the angle θ in a given initial state, we assume a random variable θϵ that follows θϵ N(0, σ2) with σ = π 12. We use the following function f(x) = x 2π ( x 2π) to guarantee that θ + θϵ lies within the interval [0, 2π]." and "Bmin values for each plan πi, with R = 0.9."