Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalising Planning Environment Redesign
Authors: Alberto Pozanco, Ramon Fraga Pereira, Daniel Borrajo
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments over a set of environment redesign benchmarks show that our general approach outperforms existing approaches when using well-known metrics, such as facilitating the recognition of goals, as well as its effectiveness when solving environment redesign tasks that optimise a novel set of different metrics. We now present the experiments carried out to evaluate GER. |
| Researcher Affiliation | Collaboration | Alberto Pozanco1*, Ramon Fraga Pereira2*, and Daniel Borrajo1 1J.P. Morgan AI Research 2University of Manchester, UK |
| Pseudocode | Yes | Algorithm 1: GER: A General Environment Redesign Approach |
| Open Source Code | Yes | Benchmarks and GER s code are available on Git Hub1. 1https://github.com/ramonpereira/general-environment-redesign |
| Open Datasets | Yes | We have created a benchmark set that contains 300 planning environment problems equally split across the five well-known domains: BLOCKS words, DEPOTS, GRID, IPC-GRID, and LOGISTICS. The environments are encoded in PDDL (Planning Domain Definition Language) (Mc Dermott et al. 1998). Benchmarks and GER s code are available on Git Hub1. 1https://github.com/ramonpereira/general-environment-redesign |
| Dataset Splits | No | The paper describes creating a benchmark set and evaluating performance, but does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | Yes | We have run all experiments using 4v CPU AMD EPYC 7R13 Processor 2.95GHz with 32GB of RAM |
| Software Dependencies | No | The paper mentions PDDL and SYM-K (von Tschammer, Mattm uller, and Speck 2022) but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We run GER with C = {time limit = 900s or memory limit = 4GB}. We used the same stopping condition C for both GER and GRD-LS. We also set a limit of 1, 000 plans to prevent disk overflows and avoid GER spending all the time computing the plan-library in redesign problems with a large number of optimal plans. |