Inequity aversion improves cooperation in intertemporal social dilemmas
Authors: Edward Hughes, Joel Z. Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, Heather Roff, Thore Graepel
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider two dilemmas in this paper, one of the public goods type and one of the commons type. Each was implemented as a partially observable Markov game on a 2D grid. Both are also intertemporal social dilemmas because individually selfish actions produce immediate benefits while their impacts on the collective develop over a longer time horizon. The availability of costly punishment is of critical importance in human sequential social dilemmas [47, 48] and is therefore an action in the environments presented here. In the Cleanup game, the aim is to collect apples from a field. Each apple provides a reward of 1. (...) We show that advantageous inequity aversion is able to resolve certain intertemporal social dilemmas without resorting to punishment by providing a temporally correct intrinsic reward. For this mechanism to be effective, the population must have sufficiently many advantageous-inequity-averse individuals. By contrast disadvantageous-inequity-averse agents can drive mutual cooperation even in small numbers. They achieve this by punishing defectors at a time concomitant with their offences. In addition, we find that advantageous inequity aversion is particularly effective for resolving public goods dilemmas, whereas disadvantageous inequity aversion is more powerful for addressing commons dilemmas. Our baseline A3C agent fails to find socially beneficial outcomes in either category of game. We define the metrics used to quantify our results in the supplementary information. |
| Researcher Affiliation | Collaboration | Edward Hughes , Joel Z. Leibo , Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin Mc Kee, Raphael Koster, Heather Roff, Thore Graepel Deep Mind, London, United Kingdom {edwardhughes, jzl, karltuyls, duenez, antoniogc, idunning, tinazhu, kevinrmckee, rkoster, hroff, thore}@google.com, matthew.phillips.12@ucl.ac.uk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides links to YouTube videos (e.g., 'https://youtu.be/N8BUzz Fx7u Q') but does not contain any statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | No | The paper describes custom-implemented game environments ('Cleanup game' and 'Harvest game') but does not provide concrete access information (e.g., a specific link, DOI, or formal citation for a publicly available dataset) to these environments or any data generated within them. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'asynchronous advantage actor-critic (A3C)' and deep neural networks, but it does not specify any software dependencies with version numbers (e.g., specific deep learning frameworks like TensorFlow or PyTorch with their versions). |
| Experiment Setup | Yes | Gradients are generated asynchronously by 24 independent copies of each agent, playing simultaneously in distinct instantiations of the environment. (...) In a sweep over values for α and β, we found our strongest results for α = 5 and β = 0.05. (...) where γ is the discount factor and λ is a hyperparameter. (...) After 1000 steps the episode ends, at which point the game resets to an initial state. |