Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Goal Recognition Design in Deterministic Environments

Authors: Sarah Keren, Avigdor Gal, Erez Karpas

JAIR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical evaluation we instantiate a variety of independent, persistent and monotonicnd GRD models that comply with the requirements speciﬁed above and show the eﬀect of design on WCD. GRD analysis consists of two core tasks, namely calculating WCD and minimizing it. Accordingly, we divide our evaluation into two main parts. In the ﬁrst, we measure WCD in diﬀerent goal recognition settings, and evaluate the methods we have suggested to calculate it. In particular, we compare our compilation-based approaches, and examine their eﬃciency for the diﬀerent GRD settings. The second part of our analysis focuses on the design task, and evaluates WCD reduction achieved through redesign in various settings, using the diﬀerent modiﬁcation methods suggested in Section 5.3. We also examine the beneﬁts of pruning using the pruned-reduce algorithm, and compare it to exhaustive-reduce.
Researcher Affiliation	Academia	Sarah Keren EMAIL Harvard University School of Engineering and Applied Sciences Cambridge, Massachusetts 02138, USA Avigdor Gal EMAIL Erez Karpas EMAIL Technion Israel Institute of Technology Haifa 3200003, Israel
Pseudocode	Yes	Algorithm 1 wcd-bfs
Open Source Code	Yes	A full code base and dataset together with a GRD task generator can be found at https://github.com/sarah-keren/goal-recognition-design
Open Datasets	Yes	Our dataset consists of four uniform cost goal recognition domains adapted from Ramirez and Geﬀner (2009), namely Grid-Navigation (GRID), IPC-Grid+ (GRID+), Blockwords (BLOCK), and Logistics (LOG). We also examined three uniform cost domains adapted from Pereira et al. (2017), namely Intrusion Detection (I-DET), Depots (DEP), and Campus (CAM). All benchmarks are based on PDDL domains from the deterministic track of the International Planning Competitions (IPC). ... A full code base and dataset together with a GRD task generator can be found at https://github.com/sarah-keren/goal-recognition-design
Dataset Splits	No	The paper describes the generation of problem instances for different observability settings (FO, NO, POD, POND) and the application of various modification methods, but it does not specify explicit training/test/validation dataset splits in the conventional sense used for machine learning models. The experiments focus on calculating and reducing WCD for these generated instances rather than on model training and evaluation using data splits.
Hardware Specification	Yes	Experiments were run on Intel(R) Xeon(R) CPU X5690 machines, with a time limit of 30 minutes and memory limit of 2 GB.
Software Dependencies	No	For the solution of the compiled planning problems, we used the Fast Downward planning system (Helmert, 2006), running A with the LM-CUT heuristic (Helmert & Domshlak, 2009) for all but the ISS domain, for which the IPDB heuristic (Haslum, Botea, Helmert, Bonet, Koenig, et al., 2007) was used. The paper mentions the Fast Downward planning system and specific heuristics, but does not provide version numbers for these software components or any other ancillary software used in the experiments.
Experiment Setup	Yes	We implemented the four modiﬁcation methods described in Section 5.3, namely action removal (AR), action conditioning (AC), sensor placement (SP), and single-action sensor reﬁnement (SR). ... To evaluate the eﬀect of design on WCD, and particularly the eﬀect of speciﬁc modiﬁcation types, we examined all instances with a design budget of 4 assigned once for each modiﬁcation type and once as an overall budget. The constraint function required the optimal cost to any of the goals to remain unchanged.