Monte-Carlo Tree Search by Best Arm Identification

Authors: Emilie Kaufmann, Wouter M. Koolen

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees.
Researcher Affiliation Academia Emilie Kaufmann CNRS & Univ. Lille, UMR 9189 (CRISt AL), Inria Seque L Lille, France emilie.kaufmann@univ-lille1.fr Wouter M. Koolen Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The Netherlands wmkoolen@cwi.nl
Pseudocode Yes Figure 2: The BAI-MCTS architecture
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes We evaluate on the depth-two benchmark tree from [10], a new depth-three tree and the random tree ensemble from [22].
Dataset Splits Yes All counts are averages over 10K repetitions with ϵ = 0 and δ = 0.1 9. To replicate the experiment from [22], we supply all algorithms with δ = 0.1 and ϵ = 0.01. On 10K full 10-ary trees of depth 3 with Bernoulli leaf parameters drawn uniformly at random from [0,1] the average numbers of samples are: LUCB-MCTS 141811, UGap E-MCTS 142953 and Find Top Winner 2254560.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments (e.g., CPU/GPU models, memory, or cloud instances).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their versions).
Experiment Setup Yes To replicate the experiment from [22], we supply all algorithms with δ = 0.1 and ϵ = 0.01. For comparing with [10] we run all algorithms with ϵ = 0 and δ = 0.1 L (undoing the conservative union bound over leaves. This excessive choice, which might even exceed one, does not cause a problem, as the algorithms depend on δ L = 0.1). For our BAI-MCTS algorithms and for M-LUCB we use the exploration rate β(s,δ) = ln L /δ + ln(ln(s) + 1) (a stylized version of Lemma 2 that works well in practice), and we use the KL refinement of the confidence intervals (1).