State Aggregation in Monte Carlo Tree Search
Authors: Jesse Hostetler, Alan Fern, Tom Dietterich
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a proof of concept, we experimentally confirm that state aggregation can improve the finite-sample performance of UCT. This section presents a small experiment that demonstrates the sample complexity benefits of abstraction. |
| Researcher Affiliation | Academia | Jesse Hostetler and Alan Fern and Tom Dietterich Department of Electrical Engineering and Computer Science Oregon State University {hostetje, afern, tgd}@eecs.oregonstate.edu |
| Pseudocode | No | The paper describes algorithms and their modifications in paragraph form but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that its source code is publicly available. |
| Open Datasets | No | Our experimental domain is a version of the card game Blackjack. We play to a maximum score of 32, instead of 21 for ordinary Blackjack. This makes the planning horizon longer, which allows abstraction to have a larger effect. We draw from an infinite deck so that card counting is not helpful, and we do not allow doubling down, splitting pairs, or surrendering. The paper describes a custom experimental domain based on Blackjack but does not provide any concrete access information (link, DOI, citation) to a publicly available dataset. |
| Dataset Splits | No | The paper mentions running experiments for "varying sample limits" and measuring "average return over 10^5 games" but does not provide specific details on training, validation, or test splits of any dataset. |
| Hardware Specification | No | No specific hardware details are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | No | We ran χ-UCT with the four representations for varying sample limits. The performance measure is the average return over 10^5 games. While the paper describes the experimental task (Blackjack variation) and the number of games played, it does not provide specific hyperparameters, optimizer settings, or detailed training configurations (e.g., learning rates, batch sizes, epochs) as required for a "Yes" answer. |