Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Approach to Quantify Plans Robustness in Real-world Applications

Authors: Francesco Percassi, Sandra Castellanos-Paez, Romain Rombourg, Mauro Vallati

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach in two real-world domains, demonstrating its effectiveness." and "We demonstrate the applicability of our framework on realistic benchmarks, showcasing its effectiveness in evaluating plan robustness." and Section "7 Experimental Evaluation" which describes applying the proposed framework to two case studies: Urban Traffic Control and Baxter.
Researcher Affiliation	Academia	1School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom 2G2ELab, Grenoble INP, CNRS, Université Grenoble Alpes, Grenoble, France EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes its methodology using mathematical formulations and descriptive text, without presenting any structured pseudocode or algorithm blocks.
Open Source Code	Yes	An example of how to set up these computations is available at https://gitlab.com/Edmond Dantes/robustnessijcai.
Open Datasets	No	The paper describes using 'historical data' for the UTC domain and generating a 'population of 1000 instances' for the Baxter domain, but it does not provide concrete access information (e.g., links, DOIs, specific repositories, or formal citations for the specific datasets used in their experiments).
Dataset Splits	No	The paper refers to a 'sample population of size 1000' and '90 instances derived from historical data' for evaluating plan robustness, and '10 plans for the generated population of 1000 instances' but does not specify training/test/validation dataset splits in the context of model development or evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'ENHSP' for plan generation and 'PPS tool' for plan validation, but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We set the significance level to α = 0.05, while the distribution over I and the number of trials N are case-specific. For (ii) and (iii), we propose two domain-specific distance metrics. All plans are generated using ENHSP with Greedy Best-First Search and either hmax or hadd heuristics [Scala et al., 2016]." and "For the angle θ in a given initial state, we assume a random variable θϵ that follows θϵ N(0, σ2) with σ = π 12. We use the following function f(x) = x 2π ( x 2π) to guarantee that θ + θϵ lies within the interval [0, 2π]." and "Bmin values for each plan πi, with R = 0.9."