Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Learning for AlphaZero via Path Consistency

Authors: Dengwei Zhao, Shikui Tu, Lei Xu

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments also demonstrate the efficiency of PCZero under offline learning setting. Taking Hex, Othello, and Gomoku as examples, the advantage of PCZero will be investigated in both offline and online learning.
Researcher Affiliation	Academia	Dengwei Zhao 1 Shikui Tu 1 Lei Xu 1 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China. Correspondence to: Shikui Tu <EMAIL>, Lei Xu <EMAIL>.
Pseudocode	Yes	Algorithm 1 v estimation for a terminated sequence; Algorithm 2 MCTS-PCZero self-play; Algorithm 3 Heuristic Path
Open Source Code	Yes	The source codes are available at https://github.com/CMACH508/PCZero.
Open Datasets	Yes	For Hex, expert dataset is collected by the self-play of Mo Hex 2.0 (Gao et al., 2018), containing 50K, 101K and 18K games for 8 8, 9 9 and 13 13 Hex respectively. WThor8 and Renju Net9 are adopted as the expert dataset for Othello and Gomoku, containing 126K and 70K games respectively.
Dataset Splits	No	The paper states: "Those datasets are divided into training set and test set randomly and the proportion of test set is 20%." It does not explicitly mention a separate validation set split.
Hardware Specification	Yes	We use 8 Ge Force RTX 2080Ti GPU and Intel(R) Xeon(R) Gold 6130 CPU with 125G RAM to do self-play. A single GTX 1050Ti GPU and Intel i7 8750H CPU with 16 GB RAM are used to test.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8) were listed.
Experiment Setup	Yes	During the self-play, MCTS runs 400 simulations to select moves and 1000 games are played in each iteration. For the first 200 epochs, the learning rate r is 0.01 and temperature parameter τ is 0.8. In the following 200 epochs, r = 0.001 and τ = 0.6. For the rest 500 epochs, r = 0.0001 and τ = 0.2. ... λ = 3.0 and l = k = 5. ... cpuct = 1.5 and the search procedure is the same with Alpha Zero (Silver et al., 2018).