reproducibilityindex.ai

Monte-Carlo Tree Search as Regularized Policy Optimization

Authors: Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Remi Munos

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we show that this modiﬁed algorithm outperforms Alpha Zero on Atari games and continuous control tasks. The paper contains a dedicated section titled '5. Experiments' with sub-sections such as '5.1. Search with low simulation budgets', '5.2. Ablation study', and '5.3. Search with large action space continuous control', featuring comparative performance figures (e.g., Figure 2: 'Comparison of median scores of Mu Zero (red) and ALL (blue) at Nsim = 5 (dotted line) and Nsim = 50 (solid line) simulations per step on Ms Pacman (Atari). Averaged across 8 seeds.').
Researcher Affiliation	Collaboration	1Deep Mind, Paris, FR 2Columbia University, New York, USA 3Deep Mind, London, UK.
Pseudocode	No	The paper describes algorithms and concepts in prose and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets	Yes	Ms Pacman (Atari) (Bellemare et al., 2013)' and 'Cheetah Run environment of the Deep Mind Control Suite (Tassa et al., 2018)'. These are well-known public benchmark datasets cited in the paper.
Dataset Splits	No	The paper refers to using well-known datasets like Atari and Deep Mind Control Suite, but it does not explicitly provide specific percentages or counts for training, validation, or test dataset splits needed to reproduce the data partitioning.
Hardware Specification	Yes	The acknowledgements section mentions 'Cloud TPU Google Cloud. https://cloud.google.com/tpu/', indicating the use of Google's Tensor Processing Units for experiments.
Software Dependencies	No	The paper mentions the use of deep learning frameworks and standard RL algorithms, but it does not specify any software dependencies with version numbers (e.g., Python version, specific library versions like TensorFlow or PyTorch).
Experiment Setup	Yes	Hyper-parameters of the algorithms are tuned to achieve the maximum possible performance for baseline Mu Zero on the Ms Pacman level of the Atari suite (Bellemare et al., 2013), and are identical in all experiments with the exception of the number of simulations per step Nsim.