Opponent-Limited Online Search for Imperfect Information Games

Authors: Weiming Liu, Haobo Fu, Qiang Fu, Yang Wei

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we test Safe-1-KLSS and two OLSS algorithms in multiple IIGs, including small and medium poker games, whose infoset sizes vary from 6 to 2600, and one large-scale two-player Mahjong game (Fu et al., 2022a), whose infoset size is approximately 1011. Safe-1-KLSS and OLSS-I are evaluated in all the poker games. OLSS-II is only evaluated in Mahjong, not in poker, because it is designed for larger-scale games. Indeed, OLSS-II is the only algorithm available for the Mahjong benchmark.
Researcher Affiliation Industry 1Tencent AI Lab, Shenzhen, China. Correspondence to: Haobo Fu <haobofu@tencent.com>.
Pseudocode No The paper provides mathematical formulations and definitions but does not include any explicit pseudocode blocks or algorithms.
Open Source Code No The paper mentions: "The experiments are conducted based on the Open Spiel project (Lanctot et al., 2019). The license is Apache-2.0." This refers to a third-party framework used, not their own implementation's source code.
Open Datasets Yes We first test our algorithms in four Leduc poker (Southey et al., 2012) and one Flop hold em Poker (FHP) (Brown et al., 2019). ... In this section, we test OLSS-II in Two-player Mahjong (Fu et al., 2022a).
Dataset Splits No The paper describes game environments and evaluation methods (e.g., 200,000 games, 100,000 decks for Mahjong) but does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification Yes Blueprint models are trained for two days with 8 V100 GPUs and 1200 CPUs. And we use 8 V100 GPUs and 2400 CPUs to train the environmental model.
Software Dependencies No The paper mentions software like "Open Spiel" and algorithms like "MCCFR" and "ACH algorithm" but does not specify version numbers for any of these components.
Experiment Setup Yes Table 4: Hyper-parameters used for the blueprint. Parameter Range Best Shared Ratio clip (ϵ) 0.5 GAE (λ) 0.95 Learning rate {2.5e-3, 2.5e-4} 2.5e-4 Discount factor (γ) 0.995 Value loss coefficient (α) 0.5 Batch size {4096, 8192} 8192 ACH Entropy coefficient (β) {0.1, 1e-4} 5e-4 Logit threshold (lth) 8 PPO Entropy coefficient (β) {0.1, 1e-4} 0.01