State Aggregation in Monte Carlo Tree Search

Authors: Jesse Hostetler, Alan Fern, Tom Dietterich

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a proof of concept, we experimentally confirm that state aggregation can improve the finite-sample performance of UCT. This section presents a small experiment that demonstrates the sample complexity benefits of abstraction.
Researcher Affiliation Academia Jesse Hostetler and Alan Fern and Tom Dietterich Department of Electrical Engineering and Computer Science Oregon State University {hostetje, afern, tgd}@eecs.oregonstate.edu
Pseudocode No The paper describes algorithms and their modifications in paragraph form but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets No Our experimental domain is a version of the card game Blackjack. We play to a maximum score of 32, instead of 21 for ordinary Blackjack. This makes the planning horizon longer, which allows abstraction to have a larger effect. We draw from an infinite deck so that card counting is not helpful, and we do not allow doubling down, splitting pairs, or surrendering. The paper describes a custom experimental domain based on Blackjack but does not provide any concrete access information (link, DOI, citation) to a publicly available dataset.
Dataset Splits No The paper mentions running experiments for "varying sample limits" and measuring "average return over 10^5 games" but does not provide specific details on training, validation, or test splits of any dataset.
Hardware Specification No No specific hardware details are mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup No We ran χ-UCT with the four representations for varying sample limits. The performance measure is the average return over 10^5 games. While the paper describes the experimental task (Blackjack variation) and the number of games played, it does not provide specific hyperparameters, optimizer settings, or detailed training configurations (e.g., learning rates, batch sizes, epochs) as required for a "Yes" answer.