Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Opponent-Limited Online Search for Imperfect Information Games

Authors: Weiming Liu, Haobo Fu, Qiang Fu, Yang Wei

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we test Safe-1-KLSS and two OLSS algorithms in multiple IIGs, including small and medium poker games, whose infoset sizes vary from 6 to 2600, and one large-scale two-player Mahjong game (Fu et al., 2022a), whose infoset size is approximately 1011. Safe-1-KLSS and OLSS-I are evaluated in all the poker games. OLSS-II is only evaluated in Mahjong, not in poker, because it is designed for larger-scale games. Indeed, OLSS-II is the only algorithm available for the Mahjong benchmark.
Researcher Affiliation	Industry	1Tencent AI Lab, Shenzhen, China. Correspondence to: Haobo Fu <EMAIL>.
Pseudocode	No	The paper provides mathematical formulations and definitions but does not include any explicit pseudocode blocks or algorithms.
Open Source Code	No	The paper mentions: "The experiments are conducted based on the Open Spiel project (Lanctot et al., 2019). The license is Apache-2.0." This refers to a third-party framework used, not their own implementation's source code.
Open Datasets	Yes	We first test our algorithms in four Leduc poker (Southey et al., 2012) and one Flop hold em Poker (FHP) (Brown et al., 2019). ... In this section, we test OLSS-II in Two-player Mahjong (Fu et al., 2022a).
Dataset Splits	No	The paper describes game environments and evaluation methods (e.g., 200,000 games, 100,000 decks for Mahjong) but does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	Yes	Blueprint models are trained for two days with 8 V100 GPUs and 1200 CPUs. And we use 8 V100 GPUs and 2400 CPUs to train the environmental model.
Software Dependencies	No	The paper mentions software like "Open Spiel" and algorithms like "MCCFR" and "ACH algorithm" but does not specify version numbers for any of these components.
Experiment Setup	Yes	Table 4: Hyper-parameters used for the blueprint. Parameter Range Best Shared Ratio clip (ϵ) 0.5 GAE (λ) 0.95 Learning rate {2.5e-3, 2.5e-4} 2.5e-4 Discount factor (γ) 0.995 Value loss coefficient (α) 0.5 Batch size {4096, 8192} 8192 ACH Entropy coefficient (β) {0.1, 1e-4} 5e-4 Logit threshold (lth) 8 PPO Entropy coefficient (β) {0.1, 1e-4} 0.01