Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty

Authors: Filip Cano Córdoba, Samuel Judson, Timos Antonopoulos, Katrine Bjørner, Nicholas Shoemaker, Scott J. Shapiro, Ruzica Piskac, Bettina Könighofer

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a case study, we show how our method can distinguish between intentional and accidental traffic collisions. ... In this section, we showcase our method on a traffic-related scenario related to Examples 1-2, and that is illustrated in Figure 2. ... All experiments were executed on an Intel Core i5 CPU with 16GB of RAM running Ubuntu 20.04. We use TEMPEST [Pranger et al., 2021] as our model checking engine. 6.1 Model of Environment 6.2 Analysis of a Trace 6.3 Comparative Analysis of Several Agents
Researcher Affiliation	Academia	1Graz University of Technology 2Yale University 3 New York University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in prose and includes a high-level flowchart (Figure 1), but it does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Find code and experimental details in the accompanying repository https://github.com/filipcano/intentional-autonomous-agents.
Open Datasets	No	The paper describes modeling a custom environment and scenario (Section 6.1) rather than using a pre-existing, publicly available dataset with concrete access information. No dataset is mentioned for public access.
Dataset Splits	No	The paper analyzes a specific scenario and generates counterfactual scenarios for analysis, but it does not describe a traditional machine learning experimental setup with training, validation, and test dataset splits with specified percentages or sample counts.
Hardware Specification	Yes	All experiments were executed on an Intel Core i5 CPU with 16GB of RAM running Ubuntu 20.04.
Software Dependencies	No	The paper mentions using "TEMPEST [Pranger et al., 2021] as our model checking engine" and "Ubuntu 20.04", but it does not provide specific version numbers for software libraries, frameworks, or solvers beyond the operating system.
Experiment Setup	Yes	As thresholds to evaluate evidence of intention, we use δL ρ = 0.25, δU ρ = 0.75 and δσ = 0.5. ... We change the following variables: Slipperiness range... Slipperiness factor... Hesitancy factor... Visibility... The variables and the ranges considered for generating counterfactuals are summarized in Table 1.