On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search

Authors: Piyush Khandelwal, Elad Liebman, Scott Niekum, Peter Stone

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.
Researcher Affiliation Academia Piyush Khandelwal PIYUSHK@CS.UTEXAS.EDU Elad Liebman ELADLIEB@CS.UTEXAS.EDU Scott Niekum SNIEKUM@CS.UTEXAS.EDU Peter Stone PSTONE@CS.UTEXAS.EDU Department of Computer Science, University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX 78712 USA
Pseudocode Yes Algorithm 1 outlines the pseudocode for MCTS planning. ... Algorithm 2 BACKPROPAGATE(trajectory) for λ-return. ... Algorithm 3 BACKPROPAGATE(trajectory) for γ-return.
Open Source Code No No explicit statement or link was found indicating that the authors have released the source code for their proposed MCTS variants.
Open Datasets Yes IPC domains are described using RDDL (Sanner, 2010)1, and parsed using the RDDL parser available with the PROST planner (Keller and Eyerich, 2012). 1Domains are available at https://github.com/ssanner/rddlsim. This paper uses updated definitions for Recon and Skill Learning, as explained in rddlsim/files/final comp/ERRATA.txt.
Dataset Splits No The paper describes using IPC domains as benchmarks and running 1,000 independent trials, but it does not specify explicit train/validation/test dataset splits in the context of supervised learning for model training. The 'domains' serve as problem instances for evaluation rather than partitioned datasets.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions 'All the code is implemented in C++', and refers to 'RDDL parser available with the PROST planner', but no specific version numbers for any software dependencies or libraries are provided.
Experiment Setup Yes Before taking each action, the agent plans for 10,000 simulations (trajectories) while reusing the appropriate sub-tree from the previous search...To evaluate the effect of backup strategies in isolation, planning actions are selected using uniform action selection ( 2.1.1)...The MCTS search depth parameter, plan Horizon, is set to 100. Planning is done using 10,000 simulations with uniform action selection, and the previous search tree is reused. All results are averaged over 1,000 independent trials, and Welch s t-test (with p < 0.05) is used for significance testing.