Deep Counterfactual Regret Minimization
Authors: Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that Deep CFR converges to an -Nash equilibrium in two-player zero-sum games and empirically evaluate performance in poker variants, including heads-up limit Texas hold em. We show that Deep CFR outperforms Neural Fictitious Self Play (NFSP) (Heinrich & Silver, 2016), which was the prior leading function approximation algorithm for imperfect-information games, and that Deep CFR is competitive with domain-speciļ¬c tabular abstraction techniques. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research 2Computer Science Department, Carnegie Mellon University 3Strategic Machine Inc., Strategy Robot Inc., and Optimized Markets Inc. |
| Pseudocode | Yes | Algorithm 1 Deep Counterfactual Regret Minimization |
| Open Source Code | No | The paper does not explicitly state that source code for the methodology is available, nor does it provide a link. |
| Open Datasets | No | The paper evaluates on game environments (FHP and HULH poker) rather than traditional public datasets used for training in supervised learning. It mentions these games as the basis for performance measurement, but does not provide concrete access information to a public dataset in the conventional sense. |
| Dataset Splits | No | The paper discusses training and evaluation within game environments but does not specify traditional training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and implicitly PyTorch via citations, but it does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | We perform 4,000 mini-batch stochastic gradient descent (SGD) iterations using a batch size of 10,000 and perform parameter updates using the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001, with gradient norm clipping to 1. For HULH we use 32,000 SGD iterations and a batch size of 20,000. |