State-Based Recurrent SPMNs for Decision-Theoretic Planning under Partial Observability

Authors: Layton Hayes, Prashant Doshi, Swaraj Pawar, Hari Teja Tatavarti

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the performance of the learning algorithm by learning S-RSPMNs on a testbed of several sequential decision-making domains from Open AI s Gym [Brockman et al., 2016] and RDDLSim [Sanner, 2010], demonstrating that they result in nearly optimal policy values for each.
Researcher Affiliation Academia 1 Institute for AI, University of Georgia, Athens GA 30602 2 Dept. of Computer Science, University of Georgia, Athens, GA 30602 {layton.hayes25, pdoshi, swaraj.pawar, contactme.hariteja}@uga.edu
Pseudocode Yes Algorithm 1 gives the main procedure, LEARNS-RSPMN, for learning the S-RSPMN template.
Open Source Code Yes The LEARNS-RSPMN algorithm has been implemented in the SPFlow library [Molina et al., 2019] and is available on Git Hub at https://github.com/minimum-Layton C/SPFlow/tree/ rspmn rdc rmeufix under the Apache license.
Open Datasets Yes As there are very few existing data sets on simulations of discrete partially observable decision-making domains, we developed a new testbed of eight data sets on decision-making problems, listed in Table 1 and available at https://github.com/ minimum-Layton C/SRSPMN dataset generators.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits or information on cross-validation.
Hardware Specification Yes All models were learned on a PC with Intel Xeon ES-2603, RHEL7, 16GB RAM.
Software Dependencies No The paper mentions the "SPFlow library" but does not provide a specific version number. It also mentions "RHEL7" which is an operating system, not an ancillary software dependency with a version.
Experiment Setup Yes Learning an S-RSPMN requires setting two parameters: horizon h, correlation threshold cthresh. Both S-RSPMN and BCQ models for all domains except Navigation were run for 100 steps (to obtain near-converged values) whereas the Navigation models were evaluated over 10 steps. All other BCQ parameters such as the number of samples and loopback values were set to default.