Learning Deviation Payoffs in Simulation-Based Games
Authors: Samuel Sokota, Caleb Ho, Bryce Wiedenbeck2173-2180
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that deviation payoff learning identifies better approximate equilibria than previous methods and can handle more difficult settings, including games with many more players, strategies, and roles. |
| Researcher Affiliation | Academia | Samuel Sokota Swarthmore College sokota@ualberta.ca Caleb Ho Swarthmore College caleb.yh.ho@gmail.com Bryce Wiedenbeck Swarthmore College bwieden1@swarthmore.edu |
| Pseudocode | Yes | Algorithm 1 Approximating Role-Symmetric Nash Equilibria |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper states "we generate action-graph games to serve as a proxy for simulators" but does not provide access information (link, citation, or repository) for these generated datasets. |
| Dataset Splits | Yes | We split queries half and half between the initial sample and resampling for our experiments and used ten iterations of resampling. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper describes the neural network architecture but does not specify any software dependencies (libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | In the following experiments, we employed a network with three dense hidden layers of 128, 64, and 32 nodes, followed by a head for each strategy with 16 hidden nodes and a single output. This architecture was tuned based on 100 player, five pure-strategy games. We held the structure fixed throughout our experiments, other than varying the number of input nodes and heads to match the number of pure strategies. We split queries half and half between the initial sample and resampling for our experiments and used ten iterations of resampling. For resampling, we chose the variance k I of the distribution we sampled from such that the expected distance between a random sample and the mean was about 0.05 (this requires a different constant k for each dimension). |