reproducibilityindex.ai

State-Based Recurrent SPMNs for Decision-Theoretic Planning under Partial Observability

Authors: Layton Hayes, Prashant Doshi, Swaraj Pawar, Hari Teja Tatavarti

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the performance of the learning algorithm by learning S-RSPMNs on a testbed of several sequential decision-making domains from Open AI s Gym [Brockman et al., 2016] and RDDLSim [Sanner, 2010], demonstrating that they result in nearly optimal policy values for each.
Researcher Affiliation	Academia	1 Institute for AI, University of Georgia, Athens GA 30602 2 Dept. of Computer Science, University of Georgia, Athens, GA 30602 {layton.hayes25, pdoshi, swaraj.pawar, contactme.hariteja}@uga.edu
Pseudocode	Yes	Algorithm 1 gives the main procedure, LEARNS-RSPMN, for learning the S-RSPMN template.
Open Source Code	Yes	The LEARNS-RSPMN algorithm has been implemented in the SPFlow library [Molina et al., 2019] and is available on Git Hub at https://github.com/minimum-Layton C/SPFlow/tree/ rspmn rdc rmeuﬁx under the Apache license.
Open Datasets	Yes	As there are very few existing data sets on simulations of discrete partially observable decision-making domains, we developed a new testbed of eight data sets on decision-making problems, listed in Table 1 and available at https://github.com/ minimum-Layton C/SRSPMN dataset generators.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits or information on cross-validation.
Hardware Specification	Yes	All models were learned on a PC with Intel Xeon ES-2603, RHEL7, 16GB RAM.
Software Dependencies	No	The paper mentions the "SPFlow library" but does not provide a specific version number. It also mentions "RHEL7" which is an operating system, not an ancillary software dependency with a version.
Experiment Setup	Yes	Learning an S-RSPMN requires setting two parameters: horizon h, correlation threshold cthresh. Both S-RSPMN and BCQ models for all domains except Navigation were run for 100 steps (to obtain near-converged values) whereas the Navigation models were evaluated over 10 steps. All other BCQ parameters such as the number of samples and loopback values were set to default.