Generalized Mean Estimation in Monte-Carlo Tree Search
Authors: Tuan Dam, Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Technische Universit at Darmstadt, Germany 2Robot Learning Group, Max Planck Institute for Intelligent Systems,T ubingen, Germany 3Computing Sciences, Tampere University, Finland |
| Pseudocode | Yes | Algorithm 1 Power-UCT |
| Open Source Code | No | No explicit statement about providing open-source code for their methodology or a link to a repository. |
| Open Datasets | Yes | For MDPs, we consider the well-known Frozen Lake problem as implemented in Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper does not specify explicit training, validation, and test dataset splits by percentage or sample count. It mentions 'evaluation runs' but not data partitioning for model training and selection. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU/GPU models, memory, or computational resources used for running experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | For MENTS we find the best combination of the two hyperparameters by grid search. In MDP tasks, we find the UCT exploration constant using grid search. For Power-UCT, we find the p-value by increasing it until performance starts to decrease. |