Monte Carlo Tree Search in the Presence of Transition Uncertainty

Authors: Farnaz Kohankhaki, Kiarash Aghakasiri, Hongming Zhang, Ting-Han Wei, Chao Gao, Martin Müller

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate UA-MCTS and its individual components on the deterministic domains from the Min Atar test suite.
Researcher Affiliation Collaboration 1University of Alberta 2Edmonton Research Center, Huawei Canada
Pseudocode Yes Algorithm 1: MCTS Framework; Algorithm 2: Uncertainty Adapted Selection Algorithm.; Algorithm 3: Uncertainty Adapted Expansion Algorithm; Algorithm 4: Uncertainty Adapted Backpropagation Algorithm.
Open Source Code Yes 1https://github.com/ualberta-mueller-group/UAMCTS
Open Datasets Yes We test our method on the three deterministic games Space Invaders, Freeway, and Breakout in the Min Atar framework (Young and Tian 2019).
Dataset Splits No The paper does not specify traditional training, validation, and test dataset splits with percentages or sample counts. Experiments are conducted in interactive game environments rather than on fixed datasets with such splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments. It only mentions 'A neural network uncertainty model'.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify any software libraries, frameworks, or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For each combination of game and algorithm we performed a parameter sweep over the exploration constant c from the set {0.5, 1, 2, 2}. For the Combined version in the online scenario, we used the best c found for the offline scenario. In the offline scenario the uncertainty factor τ is set to 0.1 (a small number so that UA-MCTS is more sensitive to the true uncertainty), and in the online scenario the τ is initialized to 10, then decays until it reaches 0.1. The uncertainty model in the online scenario is a fully connected neural network with two hidden layers of 128 hidden units each. The number of training steps E and training frequency I are both 5000, and the step size is 10 3 for the Adam optimizer (Kingma and Ba 2014). Table 1 shows a list of other hyperparameters used in UA-MCTS.