Monte Carlo Tree Search in the Presence of Transition Uncertainty
Authors: Farnaz Kohankhaki, Kiarash Aghakasiri, Hongming Zhang, Ting-Han Wei, Chao Gao, Martin Müller
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate UA-MCTS and its individual components on the deterministic domains from the Min Atar test suite. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Edmonton Research Center, Huawei Canada |
| Pseudocode | Yes | Algorithm 1: MCTS Framework; Algorithm 2: Uncertainty Adapted Selection Algorithm.; Algorithm 3: Uncertainty Adapted Expansion Algorithm; Algorithm 4: Uncertainty Adapted Backpropagation Algorithm. |
| Open Source Code | Yes | 1https://github.com/ualberta-mueller-group/UAMCTS |
| Open Datasets | Yes | We test our method on the three deterministic games Space Invaders, Freeway, and Breakout in the Min Atar framework (Young and Tian 2019). |
| Dataset Splits | No | The paper does not specify traditional training, validation, and test dataset splits with percentages or sample counts. Experiments are conducted in interactive game environments rather than on fixed datasets with such splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments. It only mentions 'A neural network uncertainty model'. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify any software libraries, frameworks, or their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For each combination of game and algorithm we performed a parameter sweep over the exploration constant c from the set {0.5, 1, 2, 2}. For the Combined version in the online scenario, we used the best c found for the offline scenario. In the offline scenario the uncertainty factor τ is set to 0.1 (a small number so that UA-MCTS is more sensitive to the true uncertainty), and in the online scenario the τ is initialized to 10, then decays until it reaches 0.1. The uncertainty model in the online scenario is a fully connected neural network with two hidden layers of 128 hidden units each. The number of training steps E and training frequency I are both 5000, and the step size is 10 3 for the Adam optimizer (Kingma and Ba 2014). Table 1 shows a list of other hyperparameters used in UA-MCTS. |