Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces

Authors: Michael H. Lim, Claire Tomlin, Zachary N. Sunberg

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The simple numerical experiments in this section confirm the theoretical results of Section 6. Specifically, they show that the value function estimates of POSS converge to the QMDP approximation and the value function estimates of POWSS converge to the optimal value function for a toy problem. ... The results plotted in Fig. 2 show the Q-value estimates of POWSS converging toward the optimal Q-values as the width C is increased.
Researcher Affiliation Academia Michael H. Lim1 , Claire J. Tomlin1 and Zachary N. Sunberg2 1University of California, Berkeley 2University of Colorado, Boulder
Pseudocode Yes Algorithm 1 Routines common to POWSS and POSS... Algorithm 2 POSS Algorithm: Estimate Q... Algorithm 3 POWSS Algorithm: Estimate Q...
Open Source Code Yes Both POWSS and POSS were implemented using the POMDPs.jl framework, [Egorov et al., 2017] and open-source code can be found at https://github.com/Julia POMDP/ Sparse Sampling.jl.
Open Datasets No We consider a simple modification of the classic tiger problem [Kaelbling et al., 1998] that we refer to as the continuous observation tiger (CO-tiger) problem.
Dataset Splits No The discount is 0.95, and the terminal depth is 3.
Hardware Specification No At C = 41, POMCPOW is about an order of magnitude faster than POWSS.
Software Dependencies No Both POWSS and POSS were implemented using the POMDPs.jl framework, [Egorov et al., 2017] and open-source code can be found at https://github.com/Julia POMDP/ Sparse Sampling.jl.
Experiment Setup Yes The discount is 0.95, and the terminal depth is 3. ... Each data point represents the mean Q-value from 200 runs of the algorithm from a uniformly-distributed belief, with the standard deviation plotted as a ribbon. ... For these tests, the double progressive widening parameters ko = C, αo = 0 were used to limit the tree width, with n = C3 iterations to keep the particle density high in wider trees (see Sunberg and Kochenderfer [2018] for parameter definitions).