Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Are AlphaZero-like Agents Robust to Adversarial Perturbations?

Authors: Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, Cho-Jui Hsieh

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58% of the Alpha Go Zero self-play games, our method can make the widely used Kata Go agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players, and 90% of examples indeed lead the Go agent to play an obviously inferior action.
Researcher Affiliation	Academia	Li-Cheng Lan1 Huan Zhang2 Ti-Rong Wu3 Meng-Yu Tsai4 I-Chen Wu3, 4 Cho-Jui Hsieh1 1UCLA 2CMU 3Academia Sinica, Taiwan 4NYCU
Pseudocode	Yes	Algorithm 1 Two-Step Value Attack
Open Source Code	Yes	Our code is available at https://Paper Code.cc/Go Attack.
Open Datasets	Yes	For the datasets, we selected 99 games from five different sources, which are Alpha Go Zero 40 blocks training self-play record (ZZ), Alpha Go Zero vs Alpha Go Master (ZM), Alpha Go Master vs Human champions (MH), the final games of LG Cup World Go Championship (2001-2020) (LG), and the final games of Asian TV Cup (2001-2020) (ATV). Note that the thinking time for ATV Cup is much shorter than LG Cup, so we expect them to reflect human games with different strengths. All the datasets have 20 games, except ZZ has 19 games since the first game is played by two random agents. ... plus an additional FOX 1 dataset, to represent amateur players... 1https://www.foxwq.com/
Dataset Splits	No	The paper lists the datasets used and the number of games in each but does not provide specific information about how these datasets were split into training, validation, and test sets for the experiments.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used to conduct the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions the use of Go AI programs like Kata Go, Leela Zero, ELF Open Go, and CGI, but it does not specify the version numbers for these software components or any other ancillary software dependencies like programming languages or libraries.
Experiment Setup	Yes	We use Kata Go (40 blocks) with 800 simulations as our examiner. For the thresholds, we set ηeq = 0.1, ηcorrect = 0.15, since after testing several different ηeq and ηcorrect values, this pair leads to more human-understandable results.