Monte-Carlo Tree Search as Regularized Policy Optimization

Authors: Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Remi Munos

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we show that this modified algorithm outperforms Alpha Zero on Atari games and continuous control tasks. The paper contains a dedicated section titled '5. Experiments' with sub-sections such as '5.1. Search with low simulation budgets', '5.2. Ablation study', and '5.3. Search with large action space continuous control', featuring comparative performance figures (e.g., Figure 2: 'Comparison of median scores of Mu Zero (red) and ALL (blue) at Nsim = 5 (dotted line) and Nsim = 50 (solid line) simulations per step on Ms Pacman (Atari). Averaged across 8 seeds.').
Researcher Affiliation Collaboration 1Deep Mind, Paris, FR 2Columbia University, New York, USA 3Deep Mind, London, UK.
Pseudocode No The paper describes algorithms and concepts in prose and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets Yes Ms Pacman (Atari) (Bellemare et al., 2013)' and 'Cheetah Run environment of the Deep Mind Control Suite (Tassa et al., 2018)'. These are well-known public benchmark datasets cited in the paper.
Dataset Splits No The paper refers to using well-known datasets like Atari and Deep Mind Control Suite, but it does not explicitly provide specific percentages or counts for training, validation, or test dataset splits needed to reproduce the data partitioning.
Hardware Specification Yes The acknowledgements section mentions 'Cloud TPU Google Cloud. https://cloud.google.com/tpu/', indicating the use of Google's Tensor Processing Units for experiments.
Software Dependencies No The paper mentions the use of deep learning frameworks and standard RL algorithms, but it does not specify any software dependencies with version numbers (e.g., Python version, specific library versions like TensorFlow or PyTorch).
Experiment Setup Yes Hyper-parameters of the algorithms are tuned to achieve the maximum possible performance for baseline Mu Zero on the Ms Pacman level of the Atari suite (Bellemare et al., 2013), and are identical in all experiments with the exception of the number of simulations per step Nsim.