Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity
Authors: Martin Smit, Fernando P. Santos
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria.We run our RL experiments with a population of 50 agents (45 in the majority group and 5 in the minority group). We fix the exploration rate µ and learning rate α to 0.1. Each simulation runs for 250,000 interactions and we run each simulation 50 times with a different seed. |
| Researcher Affiliation | Academia | Martin Smit , Fernando P. Santos Informatics Institute, University of Amsterdam {j.m.m.smit, f.p.santos}@uva.nl |
| Pseudocode | No | The paper provides mathematical equations for Q-value updates but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The source code for this paper (models, experiments, and figures) is available on Git Hub.1Appendix and code available at: www.github.com/sias-uva/ |
| Open Datasets | No | The paper describes a simulation environment with a 'well-mixed population of agents' and a 'donation game' rather than using a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes the population setup for its simulations but does not provide specific training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers required to reproduce the experiment. |
| Experiment Setup | Yes | We fix the exploration rate µ and learning rate α to 0.1. Each simulation runs for 250,000 interactions and we run each simulation 50 times with a different seed. The rate of agent execution errors and judgement execution errors is relatively rare at 1%, and the benefit-to-cost ratio in our analytical model is 5 with c = 1, b = 5. Furthermore, the majority group comprises 90% of the population, and agents in different groups are functionally identical. |