A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees
Authors: Marcus Hoerger, Hanna Kurniawati, Dirk Kroese, Nan Ye
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | yet empirically outperforms them on several continuous-action POMDP problems, particularly for problems with higher-dimensional action spaces. We tested LCEOPT on 4 decision making problems under partial observability: Cont Tag, Pushbox2D/3D, Parking2D/3D, Sensor Placement-D. |
| Researcher Affiliation | Academia | 1School of Mathematics & Physics, The University of Queensland, Queensland, Australia 2School of Computing, Australian National University, ACT, Australia |
| Pseudocode | Yes | Algorithm 1 shows the key steps of LCEOPT, with detailed pseudocodes provided in Appendix A 1. |
| Open Source Code | Yes | The source code of LCEOPT is available at https://github.com/hoergems/LCEOPT. |
| Open Datasets | No | The paper mentions benchmark problems like 'Cont Tag', 'Pushbox2D/3D', 'Parking2D/3D', and 'Sensor Placement-D', which are described as simulated environments or problem definitions. It does not provide concrete access information (links, DOIs, specific citations with author/year for datasets) for publicly available datasets used for training. |
| Dataset Splits | No | The paper conducts simulation runs for evaluation but does not describe specific training, validation, or test dataset splits, as it operates on simulated environments rather than pre-defined datasets. |
| Hardware Specification | Yes | All simulations were run single-threaded on an AMD EPYC 7003 CPU with 4GB of memory. |
| Software Dependencies | No | The paper states, 'we implemented LCEOPT, the tree baseline solvers POMCPOW, VOMCPOW and ADVT, and the problem scenarios in C++ using the OPPT framework', but it does not specify version numbers for C++ or the OPPT framework. |
| Experiment Setup | Yes | For the Cont Tag problem, we set the number of candidate policies to N = 493 and the number of trajectories per parameter vector to L = 103 for both algorithms. For the Sensor Placement-12 problem, we set N = 496 and L = 11. For each solver and problem scenario, we then used the best parameter point and ran 1,000 simulation runs with a fixed planning time of 1s (measured in CPU time) per planning step. |