reproducibilityindex.ai

Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction

Authors: Yangqing Fu, Ming Sun, Buqing Nie, Yue Gao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled Mu Zero and Gumbel Mu Zero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10% 45% search space reduction.
Researcher Affiliation	Academia	Yangqing Fu Ming Sun Buqing Nie Yue Gao Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {frank79110, mingsun, niebuqing, yuegao}@sjtu.edu.cn
Pseudocode	Yes	Algorithm 1 PTSAZero
Open Source Code	Yes	Corresponding author. Code available at https://github.com/FYQ0919/PTSA-MCTS
Open Datasets	Yes	To demonstrate the improvement in computational efficiency, the PTSA is integrated with Sampled Mu Zero in Atari and classic control benchmarks, as well as with Gumbel Mu Zero in the Gomoku benchmark.
Dataset Splits	No	The paper mentions running experiments with "10 seeds" and normalized scores but does not provide specific details on training, validation, or testing dataset splits (e.g., percentages, sample counts, or citations to predefined splits for reproducibility).
Hardware Specification	No	The paper mentions "1000 TPUs" and "40 TPUs" in the introduction in the context of Mu Zero's training, but it does not provide specific details about the hardware used to run their own experiments (e.g., exact GPU/CPU models, processor types, or memory amounts).
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	For both SMu Zero and PTSAZero, the number of sampled actions is set to 25 in Cart Pole-v0 and 12 in Lunar Lander-v2. [...] We adjusted hyperparameters for different state abstraction functions and selected the best values (ϵ and d are set to 0.5 and 0.2, respectively).