reproducibilityindex.ai

Convex Regularization in Monte-Carlo Tree Search

Authors: Tuan Q Dam, Carlo D’Eramo, Jan Peters, Joni Pajarinen

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in Alpha Go and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on wellknown RL problems across several Atari games.
Researcher Affiliation	Academia	1Department of Computer Science, Technische Universit at Darmstadt, Germany 2Department of Electrical Engineering and Automation, Aalto University, Finland.
Pseudocode	No	The paper describes algorithms (e.g., UCT, E3W) and mathematical formulations, but it does not include a clearly labeled pseudocode block or algorithm listing.
Open Source Code	No	The paper does not contain any explicit statement about releasing its source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Atari. Atari 2600 (Bellemare et al., 2013) is a popular benchmark for testing deep RL methodologies
Dataset Splits	No	The paper mentions using a pretrained Deep Q-Network for initialization and conducting experimental runs with MCTS simulations, but it does not provide specific train/validation/test dataset splits for its own model training or evaluation setup.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using a 'Deep Q-Network' and incorporating the framework into 'Alpha Go', but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries) that would be needed for reproducibility.
Experiment Setup	Yes	For a fair comparison, we use ﬁxed τ = 0.1 and ϵ = 0.1 across all algorithms. ... Each experimental run consists of 512 MCTS simulations. The temperature τ is optimized for each algorithm and game via grid-search between 0.01 and 1. The discount factor is γ = 0.99, and for PUCT the exploration constant is c = 0.1.