Maintaining Evolving Domain Models
Authors: Dan Bryce, J. Benton, Michael W. Boldt
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that Marshal learns more accurate models of planning domains if it expects and exploits model evolution. We also show that integrating interaction modalities beyond observing plans also helps to learn more accurate models. We illustrate these findings on several domains drawn from the learning track of the International Planning Competition. In our evaluation, we show that Marshal can learn how the user s mental model has changed. We employ a simulated, scripted user agent capable of (1) evolving its mental model multiple times from an initially provided model, (2) sending Marshal answers to queries and simulated (scripted) plans based on those models, and (3) interfacing with Marshal to provide empirical data on the error between the Marshal learned model and the simulated user s model. |
| Researcher Affiliation | Collaboration | Dan Bryce SIFT, LLC dbryce@sift.net J. Benton NASA ARC & AAMU-RISE Foundation j.benton@nasa.gov Michael W. Boldt SIFT, LLC mboldt@sift.net |
| Pseudocode | No | The paper describes the algorithm steps in paragraph form, but does not include structured pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available. |
| Open Datasets | Yes | We evaluate on the parking, spanner, transport, and floortile domains from the Learning Track of the 2014 International Planning Competition (IPC-2014). |
| Dataset Splits | No | The paper describes using '108 plans' as observations for Marshal and a 'testing set of 28 plans' for evaluation, but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or defined subsets in a manner that allows direct reproduction of data partitioning. |
| Hardware Specification | Yes | Our experiments were run on a cluster containing Intel Xeon Harpertown quad-core CPUs, running at 2.83 Ghz with 2 GB of memory given to each Marshal instance. |
| Software Dependencies | No | The paper mentions using 'Fast Downward (Helmert, 2006)' but does not provide specific version numbers for this or any other software dependencies, libraries, or operating systems used for the experiments. |
| Experiment Setup | Yes | Marshal uses 128, 256, 512, or 1024 particles in its particle filter. For each planning domain we assume that the user updates their mental model six times and after each change provides a series of 108 plans that they believe are valid. Each change is over a precondition, add or delete effect in an action schema. After each plan, the user answers a series of Marshal s questions in the order that Marshal determines. After each series of plans, and just prior to the next drift in the user s model, we ask Marshal to calculate the probability (given its distribution over models) that each plan within a testing set of 28 plans is valid. |