reproducibilityindex.ai

Information-guided Planning: An Online Approach for Partially Observable Problems

Authors: Matheus Aparecido Do Carmo Alves, Amokh Varma, Yehia Elkhatib, Leandro Soriano Marcolino

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents IB-POMCP, a novel algorithm for online planning under partial observability. Our approach enhances the decision-making process by using estimations of the world belief s entropy to guide a tree search process and surpass the limitations of planning in scenarios with sparse reward configurations. By performing what we denominate as an information-guided planning process, the algorithm, which incorporates a novel I-UCB function, shows significant improvements in reward and reasoning time compared to state-of-the-art baselines in several benchmark scenarios, along with theoretical convergence guarantees.
Researcher Affiliation	Academia	Matheus Aparecido do Carmo Alves Lancaster University Lancaster, United Kingdom m.a.docarmoalves@lancaster.ac.uk Amokh Varma Indian Institute of Technology Delhi, India amokhvarma@gmail.com Yehia Elkhatib University of Glasgow Glasgow, United Kingdom yehia.elkhatib@glasgow.ac.uk Leandro Soriano Marcolino Lancaster University Lancaster, United Kingdom l.marcolino@lancaster.ac.uk
Pseudocode	Yes	Algorithm/Pseudo-code: To highlight the difference between POMCP and IB-POMCP, we present the complete pseudocode of our algorithm in Appendix A1.
Open Source Code	Yes	1IB-POMCP s implementation publicly available on Git Hub: https://github.com/lsmcolab/ib-pomcp/
Open Datasets	Yes	We define five well-known domains as our benchmarks. See Appendix C for more information. All environments used were obtained from the literature. The Tiger problem is a well-known standard problem [36]. For the Maze domain, we based our design on Thomas et al. 2020 [36]. For the Rock Sample problem, we designed our own scenarios but based the implementation on Thomas et al. 2020 [36]. For Tag and Laser Tag, we used the scenarios proposed by Somani et al. (2013) [34]. For Foraging [2], we proposed our own configurations for each scenario.
Dataset Splits	No	The paper specifies running experiments across five benchmark domains and calculating mean results across 50 executions, but does not provide details on specific training, validation, or test dataset splits in the traditional sense, as it involves simulation environments rather than static datasets with predefined splits.
Hardware Specification	Yes	Each run was performed in a single node of a high-performance cluster containing 16 cores of Intel Ivy Bridge processors and 64 GB RAM.
Software Dependencies	No	The paper mentions that 'All experiments were implemented using Ad Leap-MAS [9]', but it does not provide specific version numbers for this software or any other libraries or dependencies used in the experiments.
Experiment Setup	Yes	Finally, we refer the reader to our Appendix D to information about our hyperparameters set. We used a single hyperparameter set for all Monte-Carlo Tree Search-based methods: Historical weight/Discount factor: γ = 0.95 Maximum depth for the tree: 20 Maximum number of simulations: 250 (per search) Finally, considering IB-POMCP practical enhancement, we applied q = 0.2, hence α [0.2, 0.8].