Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enhancing Chess Reinforcement Learning with Graph Representation
Authors: Tomas Rigaux, Hisashi Kashima
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments, performed on smaller networks than the initial Alpha Zero paper, show that this new architecture outperforms previous architectures with a similar number of parameters, being able to increase playing strength an order of magnitude faster. We also show that the model, when trained on a smaller 5 5 variant of chess, is able to be quickly fine-tuned to play on regular 8 8 chess, suggesting that this approach yields promising generalization abilities. |
| Researcher Affiliation | Academia | Tomas Rigaux Kyoto University Kyoto, Japan EMAIL Hisashi Kashima Kyoto University Kyoto, Japan EMAIL |
| Pseudocode | Yes | Algorithm 1: Self-Play Training |
| Open Source Code | Yes | Our code is available at https://github.com/akulen/Alpha Gateau. |
| Open Datasets | No | The paper describes using self-play to generate data on 8x8 and 5x5 chess variants. It does not explicitly state the use of a pre-existing, publicly available dataset with concrete access information (link, DOI, citation). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. Data is generated through self-play, and models are evaluated by playing games against each other to estimate Elo ratings. |
| Hardware Specification | Yes | All our models were trained using multiple Nvidia RTX A5000 GPUs (Learning speed used 8 and Fine-tuning used 6), and their Elo ratings were estimated using 6 of those GPUs. |
| Software Dependencies | No | The paper mentions software like Jax, PGX, Aim, and statsmodels, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | All models used in these experiments are trained with the Adam optimizer [12] with a learning rate of 0.001. All feature vectors have an embedding dimension of 128. The loss function is the same as for the original Alpha Zero, which is, for fθ(s) = π, v, L(π,v, π, v) = πT log( π) + (v v)2. For our experiments, an iteration consists of generating 256 games through self-play, then doing one epoch of training, split into 3904 mini-batches of size 256, after 7 iterations once the frame window is full. |