Enhancing Chess Reinforcement Learning with Graph Representation
Authors: Tomas Rigaux, Hisashi Kashima
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments, performed on smaller networks than the initial Alpha Zero paper, show that this new architecture outperforms previous architectures with a similar number of parameters, being able to increase playing strength an order of magnitude faster. We also show that the model, when trained on a smaller 5 5 variant of chess, is able to be quickly fine-tuned to play on regular 8 8 chess, suggesting that this approach yields promising generalization abilities. |
| Researcher Affiliation | Academia | Tomas Rigaux Kyoto University Kyoto, Japan tomas@rigaux.com Hisashi Kashima Kyoto University Kyoto, Japan kashima@i.kyoto-u.ac.jp |
| Pseudocode | Yes | Algorithm 1: Self-Play Training |
| Open Source Code | Yes | Our code is available at https://github.com/akulen/Alpha Gateau. |
| Open Datasets | No | The paper describes using self-play to generate data on 8x8 and 5x5 chess variants. It does not explicitly state the use of a pre-existing, publicly available dataset with concrete access information (link, DOI, citation). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. Data is generated through self-play, and models are evaluated by playing games against each other to estimate Elo ratings. |
| Hardware Specification | Yes | All our models were trained using multiple Nvidia RTX A5000 GPUs (Learning speed used 8 and Fine-tuning used 6), and their Elo ratings were estimated using 6 of those GPUs. |
| Software Dependencies | No | The paper mentions software like Jax, PGX, Aim, and statsmodels, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | All models used in these experiments are trained with the Adam optimizer [12] with a learning rate of 0.001. All feature vectors have an embedding dimension of 128. The loss function is the same as for the original Alpha Zero, which is, for fθ(s) = π, v, L(π,v, π, v) = πT log( π) + (v v)2. For our experiments, an iteration consists of generating 256 games through self-play, then doing one epoch of training, split into 3904 mini-batches of size 256, after 7 iterations once the frame window is full. |