Solving Risk-Sensitive POMDPs With and Without Cost Observations
Authors: Ping Hou, William Yeoh, Pradeep Varakantham
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we experimentally show that the new algorithm performs better than FVI in two synthetic domains and a taxi domain (Ziebart et al. 2008; Varakantham et al. 2012) generated with real-world data. We evaluate DFS, FVI (with and without pruning) and its point-based version on three domains: (i) Randomly Generated POMDPs; (ii) the Navigation domain from ICAPS IPPC 2011; and (iii) a taxi domain (Ziebart et al. 2008; Varakantham et al. 2012) generated with real-world data. |
| Researcher Affiliation | Academia | Ping Hou Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA phou@cs.nmsu.edu William Yeoh Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA wyeoh@cs.nmsu.edu Pradeep Varakantham School of Information Systems Singapore Management University Singapore 188065 pradeepv@smu.edu.sg |
| Pseudocode | Yes | Algorithm 1: DFS(b) 1 P_a 0; 2 for actions a A do 3 P_a 0; P_G a 0; 4 for costs c C do 5 ba BELIEF UPDATE(b, a) 6 for states s S and thresholds θ Θ do 7 if s G and θ 0 then 8 P_G a P_G a + ba(s , θ ) 9 P_a P_a + P_G a 10 for observations o Ω do 11 P_G a,o 0 12 for states s S and thresholds θ Θ do 13 if s / G and θ 0 then 14 P_G a,o P_G a,o + ba(s , θ ) O(a, s , o) 15 if P_G a,o > 0 then 16 bo a BELIEF UPDATE(b, a, o) 17 P_a P_a + P_G a,o DFS(bo a) 18 if P_a > P_a then 19 P_a P_a 20 record action a in the policy tree 21 return P_a |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology or links to a code repository. |
| Open Datasets | Yes | We evaluate DFS, FVI (with and without pruning) and its point-based version on three domains: (i) Randomly Generated POMDPs; (ii) the Navigation domain from ICAPS IPPC 2011; and (iii) a taxi domain (Ziebart et al. 2008; Varakantham et al. 2012) generated with real-world data. |
| Dataset Splits | No | The paper describes the generation of random POMDPs and the use of existing domains but does not provide specific details on how data was split into training, validation, or testing sets. |
| Hardware Specification | Yes | We conducted our experiments on a 3.40 GHz machine with 16GB of RAM. |
| Software Dependencies | No | The paper describes the algorithms and experimental setup but does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | We randomly generated POMDPs from 50 to 400 states, 2 actions per state, 2 successors per action, and 2 observations per action-successor pair. Each problem has exactly 1 starting state and 1 goal state. We randomly chose the costs from the range [1, 10] and varied the initial cost thresholds θ0 as a function of the accumulated cost C det of the shortest deterministic path from any starting state in the initial belief state b0. With 1000 belief points, PB-FVI finds close to optimal solutions for the verifiable cases where DFS also solves all instances. In both domains, we set θ0 =1.50 C det. |