Generalized Mean Estimation in Monte-Carlo Tree Search

Authors: Tuan Dam, Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.
Researcher Affiliation Academia 1Department of Computer Science, Technische Universit at Darmstadt, Germany 2Robot Learning Group, Max Planck Institute for Intelligent Systems,T ubingen, Germany 3Computing Sciences, Tampere University, Finland
Pseudocode Yes Algorithm 1 Power-UCT
Open Source Code No No explicit statement about providing open-source code for their methodology or a link to a repository.
Open Datasets Yes For MDPs, we consider the well-known Frozen Lake problem as implemented in Open AI Gym [Brockman et al., 2016].
Dataset Splits No The paper does not specify explicit training, validation, and test dataset splits by percentage or sample count. It mentions 'evaluation runs' but not data partitioning for model training and selection.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or computational resources used for running experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes For MENTS we find the best combination of the two hyperparameters by grid search. In MDP tasks, we find the UCT exploration constant using grid search. For Power-UCT, we find the p-value by increasing it until performance starts to decrease.