Learning Not to Regret

Authors: David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validated our algorithms faster convergence on a distribution of river poker games. Our experiments show that the meta-learned algorithms outpace their non-meta-learned counterparts, achieving more than tenfold improvements.
Researcher Affiliation Collaboration David Sychrovsk y1,2, Michal ˇSustr2,5, Elnaz Davoodi3, Michael Bowling4, Marc Lanctot3, Martin Schmid1,5 1Department of Applied Mathematics, Charles University 2Artificial Intelligence Center, Czech Technical University 3Google Deep Mind 4Department of Computing Science, University of Alberta 5Equi Libre Technologies
Pseudocode Yes Algorithm 1: Predictive regret matching (Farina, Kroer, and Sandholm 2021)
Open Source Code Yes We wrote a costume solver for river poker which outperforms other publicly available solvers. We made the solver available on https://github.com/David Sych/Riv Py.
Open Datasets No The paper describes generating a distribution from a modified Rock Paper Scissors game and using the public root state of river poker. It states, "The distribution G is generated by sampling public cards uniformly, while the player beliefs are sampled in the same way as in (Moravcik et al. 2017)." While it references a prior work, it does not provide concrete access information such as a direct link, DOI, or specific citation (with authors and year within the text) for a publicly available dataset itself.
Dataset Splits No The paper states that "Other hyperparameters10 were found via a grid search," implying a validation process was used for tuning. However, it does not explicitly provide details about a specific 'validation dataset split', such as percentages or sample counts, nor does it specify how this split was created or accessed.
Hardware Specification No The paper mentions, "We ran these experiments using a single CPU thread." While this specifies the number of CPU threads, it does not provide concrete hardware details such as the specific CPU model, GPU model, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions using "a two layer LSTM" for the neural network architecture and the "Adam optimizer." However, it does not provide specific version numbers for any software libraries (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other key software components used in the implementation or for running experiments.
Experiment Setup Yes We minimize objective (2) for T = 64 iterations over 512 epochs using the Adam optimizer. Other hyperparameters10 were found via a grid search. For both NOA and NPRM, the neural network architecture we use is a two layer LSTM. For NOA, these two layers are followed by a fully-connected layer with the softmax activation. For NPRM, we additionally scale all outputs by α 2 max, ensuring any regret vector can be represented by the network. 9We use cosine learning rate decay from 10 3 to 3 10 4.