Solving Risk-Sensitive POMDPs With and Without Cost Observations

Authors: Ping Hou, William Yeoh, Pradeep Varakantham

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we experimentally show that the new algorithm performs better than FVI in two synthetic domains and a taxi domain (Ziebart et al. 2008; Varakantham et al. 2012) generated with real-world data. We evaluate DFS, FVI (with and without pruning) and its point-based version on three domains: (i) Randomly Generated POMDPs; (ii) the Navigation domain from ICAPS IPPC 2011; and (iii) a taxi domain (Ziebart et al. 2008; Varakantham et al. 2012) generated with real-world data.
Researcher Affiliation Academia Ping Hou Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA phou@cs.nmsu.edu William Yeoh Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA wyeoh@cs.nmsu.edu Pradeep Varakantham School of Information Systems Singapore Management University Singapore 188065 pradeepv@smu.edu.sg
Pseudocode Yes Algorithm 1: DFS(b) 1 P_a 0; 2 for actions a A do 3 P_a 0; P_G a 0; 4 for costs c C do 5 ba BELIEF UPDATE(b, a) 6 for states s S and thresholds θ Θ do 7 if s G and θ 0 then 8 P_G a P_G a + ba(s , θ ) 9 P_a P_a + P_G a 10 for observations o Ω do 11 P_G a,o 0 12 for states s S and thresholds θ Θ do 13 if s / G and θ 0 then 14 P_G a,o P_G a,o + ba(s , θ ) O(a, s , o) 15 if P_G a,o > 0 then 16 bo a BELIEF UPDATE(b, a, o) 17 P_a P_a + P_G a,o DFS(bo a) 18 if P_a > P_a then 19 P_a P_a 20 record action a in the policy tree 21 return P_a
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology or links to a code repository.
Open Datasets Yes We evaluate DFS, FVI (with and without pruning) and its point-based version on three domains: (i) Randomly Generated POMDPs; (ii) the Navigation domain from ICAPS IPPC 2011; and (iii) a taxi domain (Ziebart et al. 2008; Varakantham et al. 2012) generated with real-world data.
Dataset Splits No The paper describes the generation of random POMDPs and the use of existing domains but does not provide specific details on how data was split into training, validation, or testing sets.
Hardware Specification Yes We conducted our experiments on a 3.40 GHz machine with 16GB of RAM.
Software Dependencies No The paper describes the algorithms and experimental setup but does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes We randomly generated POMDPs from 50 to 400 states, 2 actions per state, 2 successors per action, and 2 observations per action-successor pair. Each problem has exactly 1 starting state and 1 goal state. We randomly chose the costs from the range [1, 10] and varied the initial cost thresholds θ0 as a function of the accumulated cost C det of the shortest deterministic path from any starting state in the initial belief state b0. With 1000 belief points, PB-FVI finds close to optimal solutions for the verifiable cases where DFS also solves all instances. In both domains, we set θ0 =1.50 C det.