Convex Regularization in Monte-Carlo Tree Search

Authors: Tuan Q Dam, Carlo D’Eramo, Jan Peters, Joni Pajarinen

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in Alpha Go and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on wellknown RL problems across several Atari games.
Researcher Affiliation Academia 1Department of Computer Science, Technische Universit at Darmstadt, Germany 2Department of Electrical Engineering and Automation, Aalto University, Finland.
Pseudocode No The paper describes algorithms (e.g., UCT, E3W) and mathematical formulations, but it does not include a clearly labeled pseudocode block or algorithm listing.
Open Source Code No The paper does not contain any explicit statement about releasing its source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Atari. Atari 2600 (Bellemare et al., 2013) is a popular benchmark for testing deep RL methodologies
Dataset Splits No The paper mentions using a pretrained Deep Q-Network for initialization and conducting experimental runs with MCTS simulations, but it does not provide specific train/validation/test dataset splits for its own model training or evaluation setup.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions using a 'Deep Q-Network' and incorporating the framework into 'Alpha Go', but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries) that would be needed for reproducibility.
Experiment Setup Yes For a fair comparison, we use fixed τ = 0.1 and ϵ = 0.1 across all algorithms. ... Each experimental run consists of 512 MCTS simulations. The temperature τ is optimized for each algorithm and game via grid-search between 0.01 and 1. The discount factor is γ = 0.99, and for PUCT the exploration constant is c = 0.1.