Maximum Entropy Monte-Carlo Planning
Authors: Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results also demonstrate that MENTS is more sample efficient than UCT in both synthetic problems and Atari 2600 games. |
| Researcher Affiliation | Collaboration | Chenjun Xiao1 Jincheng Mei1 Ruitong Huang2 Dale Schuurmans1 Martin M uller1 1University of Alberta 2Borealis AI |
| Pseudocode | No | Section 4.1 “Algorithmic Design” describes the steps of MENTS, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We then test MENTS on five Atari games: Beam Rider, Breakout, Q*bert, Seaquest and Space Invaders. |
| Dataset Splits | No | The paper mentions training a DQN model, but it does not provide specific details about dataset splits (e.g., train/validation/test percentages or counts) for the experiments conducted with MENTS or UCT. It refers to an Appendix for setup details, which is not provided. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a “vanilla DQN” but does not specify any software dependencies (e.g., libraries or solvers) with version numbers that would be needed to replicate the experiment setup. |
| Experiment Setup | Yes | The temperature is set to 0.1. At each time step we use 500 simulations to generate a move. The UCT algorithm adopts the following tree-policy introduced in Alpha Go [13], PUCT(s, a) = Q(s, a) + c P(s, a) / (b N(s, b) + 1 + N(s, a)). |