Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Maia-2: A Unified Model for Human-AI Alignment in Chess

Authors: Zhenwei Tang, Difan Jiao, Reid McIlroy-Young, Jon Kleinberg, Siddhartha Sen, Ashton Anderson

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that this unified framework significantly enhances the alignment between AI and human players across a diverse range of expertise levels, paving the way for deeper insights into human decision-making and AI-guided teaching tools.
Researcher Affiliation Collaboration Zhenwei Tang University of Toronto EMAIL Difan Jiao University of Toronto EMAIL Reid Mc Ilroy-Young Harvard University EMAIL Jon Kleinberg Cornell University EMAIL Siddhartha Sen Microsoft Research EMAIL Ashton Anderson University of Toronto EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is available here.
Open Datasets Yes We train Maia-2 on Lichess games played between Jan 2013 and Nov 2023, with the exception of December 2019, since that is the month used for testing in the original Maia paper (and we also test on this month for consistency) [1]. We use data from Lichess, a well-known large open-source chess platform, and its open database.
Dataset Splits No The paper mentions a 'training set' and multiple 'test sets' (Maia Testset, Cross-skill Testset, Grounded Testset), but it does not explicitly provide details about a distinct 'validation' dataset split.
Hardware Specification Yes It took approximately 13 days to train Maia-2 with 2 A100 (80G) GPUs under our default settings.
Software Dependencies No The paper does not explicitly provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9), only mentions general components like ResNet-based backbone architecture.
Experiment Setup Yes Hyperparameter settings used for Maia-2 training can be found in Appendix Table 5. We report all hyperparameters involved in training Maia-2 in Table 5.