To Promote Full Cooperation in Social Dilemmas, Agents Need to Unlearn Loyalty

Authors: Chin-wing Leung, Tom Lenaerts, Paolo Turrini

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Multi-agent Q-learning with Boltzmann exploration is used to learn when to sever or maintain an association. In both the Prisoner s Dilemma and the Stag Hunt games we observe that the Out-for-Tat rewiring rule, breaking ties with other agents choosing socially undesirable actions, becomes dominant, confirming at the same time that cooperation flourishes when rewiring is fast enough relative to imitation. We conducted experiments over a population size of N = 1000 and a total number of iterations of H = 1, 000, 000.
Researcher Affiliation Academia Chin-wing Leung 1 , Tom Lenaerts 2,3,4 and Paolo Turrini 1 1Department of Computer Science, University of Warwick 2Machine Learning Group, Universit e Libre de Bruxelles 3Artificial Intelligence Lab, Vrije Universiteit Brussel 4Center for Human-Compatible AI, UC Berkeley chin-wing.leung@warwick.ac.uk, tom.lenaerts@ulb.be, p.turrini@warwick.ac.uk
Pseudocode Yes Algorithm 1 The Co-evolutionary Model
Open Source Code No The paper does not provide any explicit statement or link regarding the availability of its source code.
Open Datasets No The paper describes a simulation setup ('Consider a network of agents where each agent is randomly connected with z neighbours and assigned an action for the underlying game...'), generating its own data, and does not provide a link or citation to a publicly available dataset.
Dataset Splits No The paper describes a simulation model where agents learn, but it does not specify explicit training/validation/test dataset splits in the context of data partitioning for model evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies No The paper mentions algorithms used (e.g., Q-learning) but does not list any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes In line with our baseline model, we conducted experiments over a population size of N = 1000 and a total number of iterations of H = 1, 000, 000. Unless otherwise specified, the average neighbourhood size is z = 30, the learning rate α = 0.05, the inverse temperature for Q-learning τ = 5, and the inverse temperature for imitation β = 0.005.