Designing Skill-Compatible AI: Methodologies and Frameworks in Chess

Authors: Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, Ashton Anderson

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our agents outperform state-of-the-art chess AI (based on Alpha Zero) despite being weaker in conventional chess, demonstrating that skill-compatibility is a tangible trait that is qualitatively and measurably distinct from raw performance. Our evaluations further explore and clarify the mechanisms by which our agents achieve skill-compatibility.
Researcher Affiliation Collaboration Karim Hamade Reid Mc Ilroy-Young Siddhartha Sen kar@cs.toronto.edu reidmcy@cs.toronto.edu sidsen@microsoft.com University of Toronto University of Toronto Microsoft Research Jon Kleinberg Ashton Anderson kleinberg@cornell.edu ashton@cs.toronto.edu Cornell University University of Toronto
Pseudocode No The paper does not contain any sections explicitly labeled as "Pseudocode" or "Algorithm", nor does it present structured steps in a code-like format.
Open Source Code Yes Our code is released at github.com/CSSLab/skill-compatibility-chess. We also include several of our trained models.
Open Datasets No The paper states maia was trained on games from lichess.org, an open-source platform. However, it does not provide a direct link, DOI, specific repository for the *dataset* used, or a formal citation of the dataset itself, only the platform source.
Dataset Splits Yes To create att, a dataset of 10000 games (80% train, 10% validate, and 10% test) is generated of the following game leela maia leela maia for STT or leela maia leela maia for HB.
Hardware Specification Yes We made use of four Tesla K80 GPU s for the purpose of experimentation, each with a VRAM of 12 GB.
Software Dependencies Yes Against stockfish 13 (60k nodes), a strong classical engine that uses alpha-beta search, this version of leela obtains a score of 59 3.
Experiment Setup Yes To create att, a dataset of 10000 games (80% train, 10% validate, and 10% test) is generated of the following game leela maia leela maia for STT or leela maia leela maia for HB. Then, starting with leela s weights, and using a learning rate of 10 5, and 10000 iterations, we run back-propagation to update leela s policy and value neural network.