Monte-Carlo Tree Search by Best Arm Identification
Authors: Emilie Kaufmann, Wouter M. Koolen
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees. |
| Researcher Affiliation | Academia | Emilie Kaufmann CNRS & Univ. Lille, UMR 9189 (CRISt AL), Inria Seque L Lille, France emilie.kaufmann@univ-lille1.fr Wouter M. Koolen Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The Netherlands wmkoolen@cwi.nl |
| Pseudocode | Yes | Figure 2: The BAI-MCTS architecture |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | We evaluate on the depth-two benchmark tree from [10], a new depth-three tree and the random tree ensemble from [22]. |
| Dataset Splits | Yes | All counts are averages over 10K repetitions with ϵ = 0 and δ = 0.1 9. To replicate the experiment from [22], we supply all algorithms with δ = 0.1 and ϵ = 0.01. On 10K full 10-ary trees of depth 3 with Bernoulli leaf parameters drawn uniformly at random from [0,1] the average numbers of samples are: LUCB-MCTS 141811, UGap E-MCTS 142953 and Find Top Winner 2254560. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments (e.g., CPU/GPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their versions). |
| Experiment Setup | Yes | To replicate the experiment from [22], we supply all algorithms with δ = 0.1 and ϵ = 0.01. For comparing with [10] we run all algorithms with ϵ = 0 and δ = 0.1 L (undoing the conservative union bound over leaves. This excessive choice, which might even exceed one, does not cause a problem, as the algorithms depend on δ L = 0.1). For our BAI-MCTS algorithms and for M-LUCB we use the exploration rate β(s,δ) = ln L /δ + ln(ln(s) + 1) (a stylized version of Lemma 2 that works well in practice), and we use the KL refinement of the confidence intervals (1). |