Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Implicit Search via Discrete Diffusion: A Study on Chess

Authors: Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive controlled experiments, we show DIFFUSEARCH outperforms both the searchless and explicit search-enhanced policies. Specifically, DIFFUSEARCH outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DIFFUSEARCH demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment.
Researcher Affiliation Collaboration 1 The University of Hong Kong 2 Shanghai Jiaotong University 3 Huawei Noah s Ark Lab 4 Shanghai AI Laboratory
Pseudocode Yes Algorithm 1 DIFFUSEARCH Training Input: dataset D = {(s, (a, z))}, neural network f ( ; θ), timesteps T. Output: model parameters θ. Denote state length l = |s|; repeat Draw (s, (a, z)) D and obtain x0,1:N = s || a || z (||: concat); Draw t Uniform({1, . . . , T}); Draw xt,n q(xt,n|x0,n) for n {l+1, . . . , N}; L(θ) = λt PN n=l+1 1xt,n =x0,nx 0,n log f(xt,n; θ); Minimize L(θ) with respect to θ; until converged Algorithm 2 DIFFUSEARCH Inference Input: board state s, trained network f ( ; θ), timesteps T. Output: next action a. Denote state length l = |s|; Initialize x T,1:l = s and x T,l+1:N qnoise; for t = T, . . . , 1 do for n = l + 1, . . . , N do Draw ex0,n Cat (f(xt,n; θ)) ; Draw xt 1,n q(xt 1,n | xt,n, ex0,n); end for end for Return a = x0,l+1.
Open Source Code Yes All codes are publicly available at https://github.com/HKUNLP/Diffu Search.
Open Datasets Yes We construct a dataset for supervised training by downloading games from lichess recorded in February 2023. We utilize Stockfish 16, currently the world s strongest search-based engine, as an oracle to label board states extracted from randomly selected games on lichess.org.
Dataset Splits Yes Table 1: Data statistics. Stage Records Games Train SA-V (100k) 193,189,573 100,000 Train SA-V (10k) 17,448,268 10,000 Train others (100k) 6,564,661 100,000 Train others (10k) 659,576 10,000 Action Test 62,561 1,000 Puzzle Test 36,816 10,000
Hardware Specification Yes All experiments are done on 8 NVIDIA V100 32G GPUs.
Software Dependencies No No specific software versions for dependencies like PyTorch, TensorFlow, or Python are mentioned. Although GPT-2 transformer architecture and Adam optimizer are mentioned, their versions are not specified. Stockfish 16 is mentioned as an oracle, not a dependency of their code.
Experiment Setup Yes We train all baseline models until convergence and set a maximum of 200 epochs for diffusion models due to their slow convergence. We use the Adam optimizer (Kingma & Ba, 2015), a learning rate of 3e-4, and a batch size of 1024 for all models. By default, we set the horizon h to be 4, the number of network layers to be 8 (with a total parameter size of 7M), the diffusion timesteps to be 20, and an absorbing noise type. By default, 100 simulations are utilized in MCTS-enhanced policy, and its impact is analyzed in Figure 3. We adjust cpuct and τ, constants determining the level of exploration in MCTS, on a held-out set and set them to cpuct = 0.1 and τ = 1 for its superior performance.