Monte-Carlo Tree Search as Regularized Policy Optimization
Authors: Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Remi Munos
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we show that this modified algorithm outperforms Alpha Zero on Atari games and continuous control tasks. The paper contains a dedicated section titled '5. Experiments' with sub-sections such as '5.1. Search with low simulation budgets', '5.2. Ablation study', and '5.3. Search with large action space continuous control', featuring comparative performance figures (e.g., Figure 2: 'Comparison of median scores of Mu Zero (red) and ALL (blue) at Nsim = 5 (dotted line) and Nsim = 50 (solid line) simulations per step on Ms Pacman (Atari). Averaged across 8 seeds.'). |
| Researcher Affiliation | Collaboration | 1Deep Mind, Paris, FR 2Columbia University, New York, USA 3Deep Mind, London, UK. |
| Pseudocode | No | The paper describes algorithms and concepts in prose and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described. |
| Open Datasets | Yes | Ms Pacman (Atari) (Bellemare et al., 2013)' and 'Cheetah Run environment of the Deep Mind Control Suite (Tassa et al., 2018)'. These are well-known public benchmark datasets cited in the paper. |
| Dataset Splits | No | The paper refers to using well-known datasets like Atari and Deep Mind Control Suite, but it does not explicitly provide specific percentages or counts for training, validation, or test dataset splits needed to reproduce the data partitioning. |
| Hardware Specification | Yes | The acknowledgements section mentions 'Cloud TPU Google Cloud. https://cloud.google.com/tpu/', indicating the use of Google's Tensor Processing Units for experiments. |
| Software Dependencies | No | The paper mentions the use of deep learning frameworks and standard RL algorithms, but it does not specify any software dependencies with version numbers (e.g., Python version, specific library versions like TensorFlow or PyTorch). |
| Experiment Setup | Yes | Hyper-parameters of the algorithms are tuned to achieve the maximum possible performance for baseline Mu Zero on the Ms Pacman level of the Atari suite (Bellemare et al., 2013), and are identical in all experiments with the exception of the number of simulations per step Nsim. |