reproducibilityindex.ai

Maintaining Evolving Domain Models

Authors: Dan Bryce, J. Benton, Michael W. Boldt

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate that Marshal learns more accurate models of planning domains if it expects and exploits model evolution. We also show that integrating interaction modalities beyond observing plans also helps to learn more accurate models. We illustrate these ﬁndings on several domains drawn from the learning track of the International Planning Competition. In our evaluation, we show that Marshal can learn how the user s mental model has changed. We employ a simulated, scripted user agent capable of (1) evolving its mental model multiple times from an initially provided model, (2) sending Marshal answers to queries and simulated (scripted) plans based on those models, and (3) interfacing with Marshal to provide empirical data on the error between the Marshal learned model and the simulated user s model.
Researcher Affiliation	Collaboration	Dan Bryce SIFT, LLC dbryce@sift.net J. Benton NASA ARC & AAMU-RISE Foundation j.benton@nasa.gov Michael W. Boldt SIFT, LLC mboldt@sift.net
Pseudocode	No	The paper describes the algorithm steps in paragraph form, but does not include structured pseudocode or a clearly labeled algorithm block.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets	Yes	We evaluate on the parking, spanner, transport, and ﬂoortile domains from the Learning Track of the 2014 International Planning Competition (IPC-2014).
Dataset Splits	No	The paper describes using '108 plans' as observations for Marshal and a 'testing set of 28 plans' for evaluation, but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or defined subsets in a manner that allows direct reproduction of data partitioning.
Hardware Specification	Yes	Our experiments were run on a cluster containing Intel Xeon Harpertown quad-core CPUs, running at 2.83 Ghz with 2 GB of memory given to each Marshal instance.
Software Dependencies	No	The paper mentions using 'Fast Downward (Helmert, 2006)' but does not provide specific version numbers for this or any other software dependencies, libraries, or operating systems used for the experiments.
Experiment Setup	Yes	Marshal uses 128, 256, 512, or 1024 particles in its particle ﬁlter. For each planning domain we assume that the user updates their mental model six times and after each change provides a series of 108 plans that they believe are valid. Each change is over a precondition, add or delete effect in an action schema. After each plan, the user answers a series of Marshal s questions in the order that Marshal determines. After each series of plans, and just prior to the next drift in the user s model, we ask Marshal to calculate the probability (given its distribution over models) that each plan within a testing set of 28 plans is valid.