Covering Number for Efficient Heuristic-based POMDP Planning
Authors: Zongzhang Zhang, David Hsu, Wee Sun Lee
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare PGVI with some existing point-based algorithms in their performance on 65 out of the 68 small benchmark problems from Cassandra s POMDP website1 and 4 larger robotic problems (Ross et al., 2008; Hsu et al., 2008; Kurniawati et al., 2008; 2011). Empirically, PGVI is competitive with the state-of-the-art point-based POMDP algorithms on 65 small benchmark problems and outperforms them on 4 larger problems. |
| Researcher Affiliation | Academia | Zongzhang Zhang ZHANGZZ@COMP.NUS.EDU.SG David Hsu DYHSU@COMP.NUS.EDU.SG Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science, National University of Singapore, Singapore 117417, Singapore |
| Pseudocode | Yes | Algorithm 1 π = PGVI(ϵ, δ). Algorithm 2 EXPLORE(b, db, ϵ, δ). |
| Open Source Code | No | We used the APPL-0.95 software package to implement the PGVI algorithm, but did not use the MOMDP representation (Ong et al., 2010). http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/ (This refers to a third-party software package used for implementation, not an explicit release of the authors' own code for PGVI.) |
| Open Datasets | Yes | We compare PGVI with some existing point-based algorithms in their performance on 65 out of the 68 small benchmark problems from Cassandra s POMDP website1 and 4 larger robotic problems (Ross et al., 2008; Hsu et al., 2008; Kurniawati et al., 2008; 2011). 1http://www.pomdp.org |
| Dataset Splits | No | The paper does not provide specific details about train/validation/test dataset splits (e.g., percentages, sample counts, or cross-validation setup) for the problems used in the experiments. |
| Hardware Specification | No | Our experimental platform is a CPU at 2.40GHz, with 3GB memory. (This provides some specifications but lacks specific CPU model details.) |
| Software Dependencies | Yes | We used the APPL-0.95 software package2 to implement the PGVI algorithm |
| Experiment Setup | Yes | We set δ = (tmax t)δ0/tmax, where δ0 = 0.5, tmax represents the upper bound of running time, and t represents the elapsed time in running PGVI, to make PGVI do the best in the available time. Given that the value of δ changes with time, we use the simpler value of excess(b, db, ϵ) = V U(b) V L(b) ϵ/γdb to terminate trials. ... In PGVI and SARSOP, ϵ is set to 0.5 [V U(b0) V L(b0)] in the beginning of each trial. |