Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
An Approach to Quantify Plans Robustness in Real-world Applications
Authors: Francesco Percassi, Sandra Castellanos-Paez, Romain Rombourg, Mauro Vallati
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach in two real-world domains, demonstrating its effectiveness." and "We demonstrate the applicability of our framework on realistic benchmarks, showcasing its effectiveness in evaluating plan robustness." and Section "7 Experimental Evaluation" which describes applying the proposed framework to two case studies: Urban Traffic Control and Baxter. |
| Researcher Affiliation | Academia | 1School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom 2G2ELab, Grenoble INP, CNRS, Université Grenoble Alpes, Grenoble, France EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using mathematical formulations and descriptive text, without presenting any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | An example of how to set up these computations is available at https://gitlab.com/Edmond Dantes/robustnessijcai. |
| Open Datasets | No | The paper describes using 'historical data' for the UTC domain and generating a 'population of 1000 instances' for the Baxter domain, but it does not provide concrete access information (e.g., links, DOIs, specific repositories, or formal citations for the specific datasets used in their experiments). |
| Dataset Splits | No | The paper refers to a 'sample population of size 1000' and '90 instances derived from historical data' for evaluating plan robustness, and '10 plans for the generated population of 1000 instances' but does not specify training/test/validation dataset splits in the context of model development or evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'ENHSP' for plan generation and 'PPS tool' for plan validation, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We set the significance level to α = 0.05, while the distribution over I and the number of trials N are case-specific. For (ii) and (iii), we propose two domain-specific distance metrics. All plans are generated using ENHSP with Greedy Best-First Search and either hmax or hadd heuristics [Scala et al., 2016]." and "For the angle θ in a given initial state, we assume a random variable θϵ that follows θϵ N(0, σ2) with σ = π 12. We use the following function f(x) = x 2π ( x 2π) to guarantee that θ + θϵ lies within the interval [0, 2π]." and "Bmin values for each plan πi, with R = 0.9." |