Information-guided Planning: An Online Approach for Partially Observable Problems

Authors: Matheus Aparecido Do Carmo Alves, Amokh Varma, Yehia Elkhatib, Leandro Soriano Marcolino

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents IB-POMCP, a novel algorithm for online planning under partial observability. Our approach enhances the decision-making process by using estimations of the world belief s entropy to guide a tree search process and surpass the limitations of planning in scenarios with sparse reward configurations. By performing what we denominate as an information-guided planning process, the algorithm, which incorporates a novel I-UCB function, shows significant improvements in reward and reasoning time compared to state-of-the-art baselines in several benchmark scenarios, along with theoretical convergence guarantees.
Researcher Affiliation Academia Matheus Aparecido do Carmo Alves Lancaster University Lancaster, United Kingdom m.a.docarmoalves@lancaster.ac.uk Amokh Varma Indian Institute of Technology Delhi, India amokhvarma@gmail.com Yehia Elkhatib University of Glasgow Glasgow, United Kingdom yehia.elkhatib@glasgow.ac.uk Leandro Soriano Marcolino Lancaster University Lancaster, United Kingdom l.marcolino@lancaster.ac.uk
Pseudocode Yes Algorithm/Pseudo-code: To highlight the difference between POMCP and IB-POMCP, we present the complete pseudocode of our algorithm in Appendix A1.
Open Source Code Yes 1IB-POMCP s implementation publicly available on Git Hub: https://github.com/lsmcolab/ib-pomcp/
Open Datasets Yes We define five well-known domains as our benchmarks. See Appendix C for more information. All environments used were obtained from the literature. The Tiger problem is a well-known standard problem [36]. For the Maze domain, we based our design on Thomas et al. 2020 [36]. For the Rock Sample problem, we designed our own scenarios but based the implementation on Thomas et al. 2020 [36]. For Tag and Laser Tag, we used the scenarios proposed by Somani et al. (2013) [34]. For Foraging [2], we proposed our own configurations for each scenario.
Dataset Splits No The paper specifies running experiments across five benchmark domains and calculating mean results across 50 executions, but does not provide details on specific training, validation, or test dataset splits in the traditional sense, as it involves simulation environments rather than static datasets with predefined splits.
Hardware Specification Yes Each run was performed in a single node of a high-performance cluster containing 16 cores of Intel Ivy Bridge processors and 64 GB RAM.
Software Dependencies No The paper mentions that 'All experiments were implemented using Ad Leap-MAS [9]', but it does not provide specific version numbers for this software or any other libraries or dependencies used in the experiments.
Experiment Setup Yes Finally, we refer the reader to our Appendix D to information about our hyperparameters set. We used a single hyperparameter set for all Monte-Carlo Tree Search-based methods: Historical weight/Discount factor: γ = 0.95 Maximum depth for the tree: 20 Maximum number of simulations: 250 (per search) Finally, considering IB-POMCP practical enhancement, we applied q = 0.2, hence α [0.2, 0.8].