Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers

Authors: Luke Marris, Ian Gemp, Thomas Anthony, Andrea Tacchetti, Siqi Liu, Karl Tuyls

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Traditionally performance of NE, and (C)CE solvers has focused on evaluating time to converge to a solution within some tolerance. Feedforward neural networks can produce batches of solutions quickly4 and deterministically. For non-trivial games this is much faster than what an iterative solver could hope to achieve. We therefore focus our evaluation on the trade-offs of the neural network solver, namely (i) how long it takes to train, and (ii) how accurate the solutions are. For the latter we use two metrics: Solver Gap: 1 |σ (a) σ(a)| (C)CE Gap: (Ap(., a) p) The first (Solver Gap) measures the distance to the exact unique solution found by an iterative solver5, σ (a), and is bounded between 0 and 1, and is zero for perfect prediction. The second ((C)CE Gap) measures the distance to the equilibrium solution polytope, and is zero if it is within the polytope. Parameterization Sweeps We show performance across a number of parameterizations, including (i) different equilibrium selection criteria (Figure 3a), (ii) different shapes of games (Figure 3b), and (iii) different solution concepts (Figure 3c). Classes of Games It is known that some distributions of game payoffs are harder for some methods to solve than others [51]. We compare performance across a number of classes of transfer games (Appendix Table 4) for a single MECCE and a single MECE [48] Neural Equilibrium Solver trained on 8 8 games. Figure 4 shows the worst, mean, and best performance over 512 samples from each class in terms of (i) distance to any equilibrium, and (ii) distance to the target equilibrium found by an iterative solver. We also plot the performance of a uniform joint as a naive baseline, as the gap can be artificially reduced by scaling the payoffs. In regards to equilibrium violation, ME is tricky because it lies on the boundary of the polytope, so some violation is expected in an approximate setting. The plots showing the failure rate and run time of the iterative solver are to intuit difficult classes. The baseline iterative solver take about 0.05s to solve a single game, the network can solve a batch of 4096 games in 0.0025s. We see that for most classes the NES is very accurate with a solver gap of around 10 2. Some classes of games are indeed more difficult and these align with games that iterative equilibrium solvers struggle with. This hints that difficult games are ill-conditioned. Ablations We show the performance (Figure 3d) of the proposed method compared with (i) a linear network, and (ii) no invariant pre-processing with naive payoff sampling (each element sampled using a uniform distribution). Both result in significant reduction in performance. Scaling Due to the size of the representation of the payoffs, Gp(a), the inputs and therefore the activations of the network grow significantly with the number of joint strategies in the game. Therefore without further work on sparser payoff representation, NES is limited by size of payoff inputs. For further discussion see Section 7. Nevertheless, Table 1 shows good performance when scaling to moderately sized games. Note that the solver gap metric is incomplete on larger games because the ECOS evaluation solver fails to converge. Generalization An interesting property of the NES architecture is that its parameters do not depend on the number of strategies in the game. Therefore we can test the generalization ability of the network zero-shot on games with different numbers of strategies (Table 2). There are two observations: (i) NES only weakly generalizes to other game sizes under the solver gap metric, and (ii) NES strongly generalizes to larger games under the CCE gap, remarkably achieving zero violation. This could be mitigated by training the network on a mixture of game sizes, which we leave to future work.
Researcher Affiliation Collaboration Luke Marris Deep Mind (marris@deepmind.com), UCL Ian Gemp Deep Mind Thomas Anthony Andrea Tacchetti Siqi Liu Deep Mind, UCL
Pseudocode No The paper describes the model and methods in prose and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No We have not included code. We have described the approach in enough detail that a competent practitioner should be able to reproduce our results in less than a week.
Open Datasets Yes We compare performance across a number of classes of transfer games (Appendix Table 4) for a single MECCE and a single MECE [48] Neural Equilibrium Solver trained on 8 8 games.
Dataset Splits No The paper describes training on sampled games and evaluating on 512 samples or 128 samples for different tests and generalization, but does not provide explicit train/validation/test dataset splits (e.g., percentages or exact counts for each split).
Hardware Specification No The ACL checklist states that hardware details are in the appendix, but the appendix content is not provided in the given paper text. Thus, specific hardware details are not present.
Software Dependencies No The paper mentions software frameworks like TensorFlow, JAX, Haiku, Optax, CVXPY, ECOS, and PyTorch, but does not specify their version numbers.
Experiment Setup Yes The approximation weight, , and welfare weight, µ, are hyperparameters that control the balance of the optimization. The maximum approximation parameter, + p , is another constant that is usually chosen to be equal to the payoff scale (Section 4.1). The approximation term is designed to have a similar form to the relative entropy and is maximum when ˆ p = p. We refer to this equilibrium selection framework as Target Approximate Maximum Welfare Minimum Relative Entropy (ˆ MWMRE).