Covering Number for Efficient Heuristic-based POMDP Planning

Authors: Zongzhang Zhang, David Hsu, Wee Sun Lee

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare PGVI with some existing point-based algorithms in their performance on 65 out of the 68 small benchmark problems from Cassandra s POMDP website1 and 4 larger robotic problems (Ross et al., 2008; Hsu et al., 2008; Kurniawati et al., 2008; 2011). Empirically, PGVI is competitive with the state-of-the-art point-based POMDP algorithms on 65 small benchmark problems and outperforms them on 4 larger problems.
Researcher Affiliation Academia Zongzhang Zhang ZHANGZZ@COMP.NUS.EDU.SG David Hsu DYHSU@COMP.NUS.EDU.SG Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
Pseudocode Yes Algorithm 1 π = PGVI(ϵ, δ). Algorithm 2 EXPLORE(b, db, ϵ, δ).
Open Source Code No We used the APPL-0.95 software package to implement the PGVI algorithm, but did not use the MOMDP representation (Ong et al., 2010). http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/ (This refers to a third-party software package used for implementation, not an explicit release of the authors' own code for PGVI.)
Open Datasets Yes We compare PGVI with some existing point-based algorithms in their performance on 65 out of the 68 small benchmark problems from Cassandra s POMDP website1 and 4 larger robotic problems (Ross et al., 2008; Hsu et al., 2008; Kurniawati et al., 2008; 2011). 1http://www.pomdp.org
Dataset Splits No The paper does not provide specific details about train/validation/test dataset splits (e.g., percentages, sample counts, or cross-validation setup) for the problems used in the experiments.
Hardware Specification No Our experimental platform is a CPU at 2.40GHz, with 3GB memory. (This provides some specifications but lacks specific CPU model details.)
Software Dependencies Yes We used the APPL-0.95 software package2 to implement the PGVI algorithm
Experiment Setup Yes We set δ = (tmax t)δ0/tmax, where δ0 = 0.5, tmax represents the upper bound of running time, and t represents the elapsed time in running PGVI, to make PGVI do the best in the available time. Given that the value of δ changes with time, we use the simpler value of excess(b, db, ϵ) = V U(b) V L(b) ϵ/γdb to terminate trials. ... In PGVI and SARSOP, ϵ is set to 0.5 [V U(b0) V L(b0)] in the beginning of each trial.