Learning Deviation Payoffs in Simulation-Based Games

Authors: Samuel Sokota, Caleb Ho, Bryce Wiedenbeck2173-2180

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that deviation payoff learning identifies better approximate equilibria than previous methods and can handle more difficult settings, including games with many more players, strategies, and roles.
Researcher Affiliation Academia Samuel Sokota Swarthmore College sokota@ualberta.ca Caleb Ho Swarthmore College caleb.yh.ho@gmail.com Bryce Wiedenbeck Swarthmore College bwieden1@swarthmore.edu
Pseudocode Yes Algorithm 1 Approximating Role-Symmetric Nash Equilibria
Open Source Code No The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper states "we generate action-graph games to serve as a proxy for simulators" but does not provide access information (link, citation, or repository) for these generated datasets.
Dataset Splits Yes We split queries half and half between the initial sample and resampling for our experiments and used ten iterations of resampling.
Hardware Specification No The paper does not provide specific details regarding the hardware used for the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper describes the neural network architecture but does not specify any software dependencies (libraries, frameworks) with version numbers.
Experiment Setup Yes In the following experiments, we employed a network with three dense hidden layers of 128, 64, and 32 nodes, followed by a head for each strategy with 16 hidden nodes and a single output. This architecture was tuned based on 100 player, five pure-strategy games. We held the structure fixed throughout our experiments, other than varying the number of input nodes and heads to match the number of pure strategies. We split queries half and half between the initial sample and resampling for our experiments and used ten iterations of resampling. For resampling, we chose the variance k I of the distribution we sampled from such that the expected distance between a random sample and the mean was about 0.05 (this requires a different constant k for each dimension).