A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees

Authors: Marcus Hoerger, Hanna Kurniawati, Dirk Kroese, Nan Ye

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental yet empirically outperforms them on several continuous-action POMDP problems, particularly for problems with higher-dimensional action spaces. We tested LCEOPT on 4 decision making problems under partial observability: Cont Tag, Pushbox2D/3D, Parking2D/3D, Sensor Placement-D.
Researcher Affiliation Academia 1School of Mathematics & Physics, The University of Queensland, Queensland, Australia 2School of Computing, Australian National University, ACT, Australia
Pseudocode Yes Algorithm 1 shows the key steps of LCEOPT, with detailed pseudocodes provided in Appendix A 1.
Open Source Code Yes The source code of LCEOPT is available at https://github.com/hoergems/LCEOPT.
Open Datasets No The paper mentions benchmark problems like 'Cont Tag', 'Pushbox2D/3D', 'Parking2D/3D', and 'Sensor Placement-D', which are described as simulated environments or problem definitions. It does not provide concrete access information (links, DOIs, specific citations with author/year for datasets) for publicly available datasets used for training.
Dataset Splits No The paper conducts simulation runs for evaluation but does not describe specific training, validation, or test dataset splits, as it operates on simulated environments rather than pre-defined datasets.
Hardware Specification Yes All simulations were run single-threaded on an AMD EPYC 7003 CPU with 4GB of memory.
Software Dependencies No The paper states, 'we implemented LCEOPT, the tree baseline solvers POMCPOW, VOMCPOW and ADVT, and the problem scenarios in C++ using the OPPT framework', but it does not specify version numbers for C++ or the OPPT framework.
Experiment Setup Yes For the Cont Tag problem, we set the number of candidate policies to N = 493 and the number of trajectories per parameter vector to L = 103 for both algorithms. For the Sensor Placement-12 problem, we set N = 496 and L = 11. For each solver and problem scenario, we then used the best parameter point and ran 1,000 simulation runs with a fixed planning time of 1s (measured in CPU time) per planning step.