Fictitious Self-Play in Extensive-Form Games
Authors: Johannes Heinrich, Marc Lanctot, David Silver
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in imperfect-information poker games compare our approaches and demonstrate their convergence to approximate Nash equilibria. |
| Researcher Affiliation | Collaboration | Johannes Heinrich J.HEINRICH@CS.UCL.AC.UK University College London, UK Marc Lanctot LANCTOT@GOOGLE.COM Google Deep Mind, London, UK David Silver DAVIDSILVER@GOOGLE.COM Google Deep Mind, London, UK |
| Pseudocode | Yes | Algorithm 1 Full-width extensive-form fictitious play |
| Open Source Code | No | The paper does not provide any concrete access to source code (no specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | No | The paper mentions games like Leduc Hold'em and River Poker, but it does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset used in the experiments. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | In particular, at each iteration, k, FQI replayed 30 episodes with learning stepsize 0.05 1+0.003k. It returned a policy that at each information state was determined by a Boltzmann distribution over the estimated Q-values, using temperature (1 + 0.02k) 1. The state of FQI was maintained across iterations... For each player i, FSP used a replay memory, Mi RL, with space for 40000 episodes. Once this memory was full, FSP sampled 2 episodes from strategy profile σ and 1 episode from (βi, σ i) at each iteration for each player respectively, i.e. we set n = 2 and m = 1 in algorithm 2... We set the mixing parameter to ηk = αk γp , where p = n+m Memory Size and the constant γ controls how many iterations constitute one formal fictitious play iteration. In all our experiments, we used γ = 10. Both algorithms average strategy profiles were initialized to a uniform distribution at each information state. Each algorithm trained for 300 seconds. |