reproducibilityindex.ai

Unveiling Concepts Learned by a World-Class Chess-Playing Agent

Authors: Aðalsteinn Pálsson, Yngvi Björnsson

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments need an external dataset to generate the concept probes. For that we use a dataset generated by Leela Chess Zero that is listed as a quality dataset (training_data at [Stockfish, 2022d]), from which we randomly sampled 100k positions.
Researcher Affiliation	Academia	Aðalsteinn Pálsson , Yngvi Björnsson Department of Computer Science, Reykjavik University
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper mentions Stockfish is open-source and provides links to its general project pages (e.g., https://stockfishchess.org/ and https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md), but it does not explicitly state that the code for the interpretability methods developed or used in this specific paper is available or open-sourced.
Open Datasets	Yes	For that we use a dataset generated by Leela Chess Zero that is listed as a quality dataset (training_data at [Stockfish, 2022d]), from which we randomly sampled 100k positions. and [Stockfish, 2022d] Stockfish. Training datasets. https://github.com/glinscott/nnue-pytorch/wiki/ Training-datasets, 2022. Accessed: 2022-01-30.
Dataset Splits	Yes	The error bars of Figures 6 and 7 show the standard error of the mean of cross-validation results over five splits.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions 'version 14.1 of Stockfish' and 'ridge regression' but does not provide a comprehensive list of specific software dependencies with version numbers (e.g., Python, PyTorch, scikit-learn versions) required to reproduce the experiments.
Experiment Setup	Yes	For each probe, we perform a hyperparameter search over alpha values (the L2 term multiplier) of [0.01, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000].