reproducibilityindex.ai

A Bayesian Approach to Online Planning

Authors: Nir Greshler, David Ben Eli, Carmel Rabinovitz, Gabi Guetta, Liran Gispan, Guy Zohar, Aviv Tamar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that on the Proc Gen Maze and Leaper environments, when the uncertainty estimates are accurate but the neural network output is inaccurate, our Bayesian approach searches the tree much more effectively. 4. Experiments We aim to demonstrate the potential of using uncertainty estimates in online planning.
Researcher Affiliation	Collaboration	Nir Greshler 1 David Ben Eli 1 Carmel Rabinovitz 1 Gabi Guetta 1 Liran Gispan 1 Guy Zohar 1 Aviv Tamar 2 ... 1General Motors, Advanced Technical Center, Israel 2Department of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel.
Pseudocode	Yes	Algorithm 1 Thompson Sampling Tree Search and Algorithm 2 Bayes-UCB Tree Search
Open Source Code	Yes	Our code is available at: https://github.com/nirgreshler/ bayesian-online-planning.
Open Datasets	Yes	To this end, we focus on two tasks from the Proc Gen suite of procedurally generated game environments (Cobbe et al., 2020), Maze and Leaper.
Dataset Splits	No	We consider a set of Ntrain training levels and disjoint set of Ntest test levels. We train our neural networks on the training levels, and evaluate their performance on the test levels. In particular, we use a dataset of 150 samples for training, and also present the results of an evaluation on this set (see Figure 4a.). For testing, we use a disjoint set of 500 samples to evaluate the different planners (see Figure 4). (Explanation: The paper describes training and test splits, but does not explicitly mention a separate validation split or how one would be derived.)
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type, memory) used for running its experiments.
Software Dependencies	No	Our neural network model was trained using the following parameters: The optimizer is an Adam optimizer with fixed learning rate of 0.001, β coefficients of 0.9 and 0.999 for running averages of gradient and its square respectively, and ϵ of 1e-8. (Explanation: The paper mentions the optimizer and architecture but does not specify software versions for programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, etc.)
Experiment Setup	Yes	Our neural network model was trained using the following parameters: The optimizer is an Adam optimizer with fixed learning rate of 0.001, β coefficients of 0.9 and 0.999 for running averages of gradient and its square respectively, and ϵ of 1e-8. In our experiments we set Nbuffer = 40000, and Nbs = 200. The annotator is run on 150 different mazes, each for up to 200 environment steps (early stopping if reaching the goal), and each step has a search budget of 250.