Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction

Authors: Yangqing Fu, Ming Sun, Buqing Nie, Yue Gao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled Mu Zero and Gumbel Mu Zero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10% 45% search space reduction.
Researcher Affiliation Academia Yangqing Fu Ming Sun Buqing Nie Yue Gao Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {frank79110, mingsun, niebuqing, yuegao}@sjtu.edu.cn
Pseudocode Yes Algorithm 1 PTSAZero
Open Source Code Yes Corresponding author. Code available at https://github.com/FYQ0919/PTSA-MCTS
Open Datasets Yes To demonstrate the improvement in computational efficiency, the PTSA is integrated with Sampled Mu Zero in Atari and classic control benchmarks, as well as with Gumbel Mu Zero in the Gomoku benchmark.
Dataset Splits No The paper mentions running experiments with "10 seeds" and normalized scores but does not provide specific details on training, validation, or testing dataset splits (e.g., percentages, sample counts, or citations to predefined splits for reproducibility).
Hardware Specification No The paper mentions "1000 TPUs" and "40 TPUs" in the introduction in the context of Mu Zero's training, but it does not provide specific details about the hardware used to run their own experiments (e.g., exact GPU/CPU models, processor types, or memory amounts).
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes For both SMu Zero and PTSAZero, the number of sampled actions is set to 25 in Cart Pole-v0 and 12 in Lunar Lander-v2. [...] We adjusted hyperparameters for different state abstraction functions and selected the best values (ϵ and d are set to 0.5 and 0.2, respectively).