Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces
Authors: Michael H. Lim, Claire Tomlin, Zachary N. Sunberg
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The simple numerical experiments in this section confirm the theoretical results of Section 6. Specifically, they show that the value function estimates of POSS converge to the QMDP approximation and the value function estimates of POWSS converge to the optimal value function for a toy problem. ... The results plotted in Fig. 2 show the Q-value estimates of POWSS converging toward the optimal Q-values as the width C is increased. |
| Researcher Affiliation | Academia | Michael H. Lim1 , Claire J. Tomlin1 and Zachary N. Sunberg2 1University of California, Berkeley 2University of Colorado, Boulder |
| Pseudocode | Yes | Algorithm 1 Routines common to POWSS and POSS... Algorithm 2 POSS Algorithm: Estimate Q... Algorithm 3 POWSS Algorithm: Estimate Q... |
| Open Source Code | Yes | Both POWSS and POSS were implemented using the POMDPs.jl framework, [Egorov et al., 2017] and open-source code can be found at https://github.com/Julia POMDP/ Sparse Sampling.jl. |
| Open Datasets | No | We consider a simple modification of the classic tiger problem [Kaelbling et al., 1998] that we refer to as the continuous observation tiger (CO-tiger) problem. |
| Dataset Splits | No | The discount is 0.95, and the terminal depth is 3. |
| Hardware Specification | No | At C = 41, POMCPOW is about an order of magnitude faster than POWSS. |
| Software Dependencies | No | Both POWSS and POSS were implemented using the POMDPs.jl framework, [Egorov et al., 2017] and open-source code can be found at https://github.com/Julia POMDP/ Sparse Sampling.jl. |
| Experiment Setup | Yes | The discount is 0.95, and the terminal depth is 3. ... Each data point represents the mean Q-value from 200 runs of the algorithm from a uniformly-distributed belief, with the standard deviation plotted as a ribbon. ... For these tests, the double progressive widening parameters ko = C, αo = 0 were used to limit the tree width, with n = C3 iterations to keep the particle density high in wider trees (see Sunberg and Kochenderfer [2018] for parameter definitions). |