Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Testability of BDI Agent Systems

Authors: M. Winikoff, S. Cranefield

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In order to compare a real BDI platform s execution with the results of our abstract BDI execution model we implemented the two goal-plan trees in Appendix A in the JACK agent programming language. The structure of the plans and events precisely mirrors the structure of the tree. As in the goal-plan tree, each event has two relevant plans, both of which are always applicable, and selectable in either order. Actions were implemented using code that printed out the action name, and then, depending on a condition (described below), either continued execution or triggered failure (and printed out a failure indicator)... A test harness systematically generated all inputs, thus forcing all decision options to be explored. The results matched those computed by the Prolog code of Figure 3, giving precisely the same six traces for the smaller tree, and the same 162 traces for the larger tree.
Researcher Affiliation	Academia	Michael Winikoﬀ EMAIL Stephen Craneﬁeld EMAIL Department of Information Science University of Otago New Zealand
Pseudocode	Yes	Boolean function execute(an-event) let relevant-plans = set of plan instances resulting from matching all plans event patterns to an-event let tried-plans = while true do let applicable-plans = set of plan instances resulting from solving the context conditions of relevant-plans applicable-plans := applicable-plans \ tried-plans if applicable-plans is empty then return false select plan p applicable-plans tried-plans := tried-plans {p} if execute(p.body) = true then return true endwhile Boolean function execute(plan-body) if plan-body is empty then return true elseif execute(ﬁrst(plan-body)) = false then return false else return execute(rest(plan-body)) endif Boolean function execute(action) attempt to perform the action if action executed successfully then return true else return false endif Figure 2: BDI Execution Cycle. Prolog code implementing the process can be found in Figure 3.
Open Source Code	No	The code is available upon request from the authors.
Open Datasets	No	The paper refers to an industrial application at Daimler (Burmeister, Arnold, Copaciu, & Rimassa, 2008) and a goal-plan tree from that work (Figure 11). However, it does not provide concrete access information (link, DOI, repository, etc.) for this or any other dataset in a publicly available format.
Dataset Splits	No	The paper analyzes the behavior space of BDI agents through mathematical models and a 'reality check' against an implemented BDI system and a real-world goal-plan tree structure. It does not involve experiments on datasets that require training, validation, or test splits.
Hardware Specification	No	The paper mentions implementing goal-plan trees in the JACK agent programming language and using 'System.out.print' for actions. However, it does not specify any particular CPU models, GPU models, memory amounts, or detailed computer specifications used for running the experiments or simulations.
Software Dependencies	No	The paper mentions using the 'JACK agent programming language', 'Prolog code' for expansion, and 'Python rmpoly and GMPY2 libraries' for generating polynomial representations. However, none of these mentions include specific version numbers for the software components.
Experiment Setup	No	The paper describes that a 'test harness systematically generated all inputs, thus forcing all decision options to be explored' and that 'conditions that determined whether an action failed or succeeded, and which plan was selected ﬁrst, were controlled by an input (N.i, a Java class variable)'. This details the experimental logic but does not provide specific hyperparameters (e.g., learning rate, batch size) or other system-level training settings typically found in experimental setup sections.