A Bayesian Approach to Online Planning

Authors: Nir Greshler, David Ben Eli, Carmel Rabinovitz, Gabi Guetta, Liran Gispan, Guy Zohar, Aviv Tamar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that on the Proc Gen Maze and Leaper environments, when the uncertainty estimates are accurate but the neural network output is inaccurate, our Bayesian approach searches the tree much more effectively. 4. Experiments We aim to demonstrate the potential of using uncertainty estimates in online planning.
Researcher Affiliation Collaboration Nir Greshler 1 David Ben Eli 1 Carmel Rabinovitz 1 Gabi Guetta 1 Liran Gispan 1 Guy Zohar 1 Aviv Tamar 2 ... 1General Motors, Advanced Technical Center, Israel 2Department of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel.
Pseudocode Yes Algorithm 1 Thompson Sampling Tree Search and Algorithm 2 Bayes-UCB Tree Search
Open Source Code Yes Our code is available at: https://github.com/nirgreshler/ bayesian-online-planning.
Open Datasets Yes To this end, we focus on two tasks from the Proc Gen suite of procedurally generated game environments (Cobbe et al., 2020), Maze and Leaper.
Dataset Splits No We consider a set of Ntrain training levels and disjoint set of Ntest test levels. We train our neural networks on the training levels, and evaluate their performance on the test levels. In particular, we use a dataset of 150 samples for training, and also present the results of an evaluation on this set (see Figure 4a.). For testing, we use a disjoint set of 500 samples to evaluate the different planners (see Figure 4). (Explanation: The paper describes training and test splits, but does not explicitly mention a separate validation split or how one would be derived.)
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type, memory) used for running its experiments.
Software Dependencies No Our neural network model was trained using the following parameters: The optimizer is an Adam optimizer with fixed learning rate of 0.001, β coefficients of 0.9 and 0.999 for running averages of gradient and its square respectively, and ϵ of 1e-8. (Explanation: The paper mentions the optimizer and architecture but does not specify software versions for programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, etc.)
Experiment Setup Yes Our neural network model was trained using the following parameters: The optimizer is an Adam optimizer with fixed learning rate of 0.001, β coefficients of 0.9 and 0.999 for running averages of gradient and its square respectively, and ϵ of 1e-8. In our experiments we set Nbuffer = 40000, and Nbs = 200. The annotator is run on 150 different mazes, each for up to 200 environment steps (early stopping if reaching the goal), and each step has a search budget of 250.