reproducibilityindex.ai

Deep Counterfactual Regret Minimization

Authors: Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that Deep CFR converges to an -Nash equilibrium in two-player zero-sum games and empirically evaluate performance in poker variants, including heads-up limit Texas hold em. We show that Deep CFR outperforms Neural Fictitious Self Play (NFSP) (Heinrich & Silver, 2016), which was the prior leading function approximation algorithm for imperfect-information games, and that Deep CFR is competitive with domain-speciﬁc tabular abstraction techniques.
Researcher Affiliation	Collaboration	1Facebook AI Research 2Computer Science Department, Carnegie Mellon University 3Strategic Machine Inc., Strategy Robot Inc., and Optimized Markets Inc.
Pseudocode	Yes	Algorithm 1 Deep Counterfactual Regret Minimization
Open Source Code	No	The paper does not explicitly state that source code for the methodology is available, nor does it provide a link.
Open Datasets	No	The paper evaluates on game environments (FHP and HULH poker) rather than traditional public datasets used for training in supervised learning. It mentions these games as the basis for performance measurement, but does not provide concrete access information to a public dataset in the conventional sense.
Dataset Splits	No	The paper discusses training and evaluation within game environments but does not specify traditional training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and implicitly PyTorch via citations, but it does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	We perform 4,000 mini-batch stochastic gradient descent (SGD) iterations using a batch size of 10,000 and perform parameter updates using the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001, with gradient norm clipping to 1. For HULH we use 32,000 SGD iterations and a batch size of 20,000.