State-Based Recurrent SPMNs for Decision-Theoretic Planning under Partial Observability
Authors: Layton Hayes, Prashant Doshi, Swaraj Pawar, Hari Teja Tatavarti
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the performance of the learning algorithm by learning S-RSPMNs on a testbed of several sequential decision-making domains from Open AI s Gym [Brockman et al., 2016] and RDDLSim [Sanner, 2010], demonstrating that they result in nearly optimal policy values for each. |
| Researcher Affiliation | Academia | 1 Institute for AI, University of Georgia, Athens GA 30602 2 Dept. of Computer Science, University of Georgia, Athens, GA 30602 {layton.hayes25, pdoshi, swaraj.pawar, contactme.hariteja}@uga.edu |
| Pseudocode | Yes | Algorithm 1 gives the main procedure, LEARNS-RSPMN, for learning the S-RSPMN template. |
| Open Source Code | Yes | The LEARNS-RSPMN algorithm has been implemented in the SPFlow library [Molina et al., 2019] and is available on Git Hub at https://github.com/minimum-Layton C/SPFlow/tree/ rspmn rdc rmeufix under the Apache license. |
| Open Datasets | Yes | As there are very few existing data sets on simulations of discrete partially observable decision-making domains, we developed a new testbed of eight data sets on decision-making problems, listed in Table 1 and available at https://github.com/ minimum-Layton C/SRSPMN dataset generators. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits or information on cross-validation. |
| Hardware Specification | Yes | All models were learned on a PC with Intel Xeon ES-2603, RHEL7, 16GB RAM. |
| Software Dependencies | No | The paper mentions the "SPFlow library" but does not provide a specific version number. It also mentions "RHEL7" which is an operating system, not an ancillary software dependency with a version. |
| Experiment Setup | Yes | Learning an S-RSPMN requires setting two parameters: horizon h, correlation threshold cthresh. Both S-RSPMN and BCQ models for all domains except Navigation were run for 100 steps (to obtain near-converged values) whereas the Navigation models were evaluated over 10 steps. All other BCQ parameters such as the number of samples and loopback values were set to default. |