Modifying MCTS for Human-Like General Video Game Playing

Authors: Ahmed Khalifa, Aaron Isaksen, Julian Togelius, Andy Nealen

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our modified MCTS agent, called Bo T, plays quantitatively similar to human players as measured by the distribution of repeated actions. A survey of human observers reveals that the agent exhibits human-like playing style in some games but not others. Our aim is not just mimicking human distributions, but to combine human-like action distributions with the generality and positive behaviors of MCTS. An algorithm that simply sampled actions from the human distribution would fit perfectly but would be an atrociously bad game player. In order to make sure that the modified MCTS is still generic, we also use score and win rate as a performance metrics.
Researcher Affiliation Academia Ahmed Khalifa, Aaron Isaksen, Julian Togelius, Andy Nealen New York University, Tandon School of Engineering ahmed.khalifa@nyu.edu, aisaksen@nyu.edu, julian@togelius.com, nealen@nyu.edu
Pseudocode No The paper describes the steps of the MCTS algorithm and its modifications in paragraph form but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology (Bo T agent) is publicly available.
Open Datasets No The paper mentions collecting human play traces and using the GVG-AI competition framework, but it does not provide concrete access information (e.g., a link, DOI, or specific repository) for the collected human play traces dataset, nor does it cite the human play traces as a publicly available dataset with authors and year.
Dataset Splits No The paper does not provide specific details about training, validation, or test dataset splits for the data used to develop or evaluate their agent. It describes data collection for analysis and comparison, and a user study for evaluation, but not conventional dataset splitting.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions the use of the GVG-AI competition framework, but it does not specify any particular software dependencies with version numbers (e.g., programming languages, libraries, or specific software packages like PyTorch 1.9 or CPLEX 12.4).
Experiment Setup Yes UCBj = Xj + C... where C is a constant to encourage exploration (Section 4). The parameters for Hj are calculated from the histograms recorded from human play traces in Section 3 (Section 4.1). ...where E is a constant for the contribution of the bonus exploration term (Section 4.3). A value of Q = 0.25 has an approximately best effect, chosen by visual inspection of the output behavior to make it appear most human (Section 4.4).