Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity

Authors: Martin Smit, Fernando P. Santos

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria.We run our RL experiments with a population of 50 agents (45 in the majority group and 5 in the minority group). We fix the exploration rate µ and learning rate α to 0.1. Each simulation runs for 250,000 interactions and we run each simulation 50 times with a different seed.
Researcher Affiliation Academia Martin Smit , Fernando P. Santos Informatics Institute, University of Amsterdam {j.m.m.smit, f.p.santos}@uva.nl
Pseudocode No The paper provides mathematical equations for Q-value updates but does not include structured pseudocode or an algorithm block.
Open Source Code Yes The source code for this paper (models, experiments, and figures) is available on Git Hub.1Appendix and code available at: www.github.com/sias-uva/
Open Datasets No The paper describes a simulation environment with a 'well-mixed population of agents' and a 'donation game' rather than using a publicly available dataset with concrete access information.
Dataset Splits No The paper describes the population setup for its simulations but does not provide specific training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers required to reproduce the experiment.
Experiment Setup Yes We fix the exploration rate µ and learning rate α to 0.1. Each simulation runs for 250,000 interactions and we run each simulation 50 times with a different seed. The rate of agent execution errors and judgement execution errors is relatively rare at 1%, and the benefit-to-cost ratio in our analytical model is 5 with c = 1, b = 5. Furthermore, the majority group comprises 90% of the population, and agents in different groups are functionally identical.