To Promote Full Cooperation in Social Dilemmas, Agents Need to Unlearn Loyalty
Authors: Chin-wing Leung, Tom Lenaerts, Paolo Turrini
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Multi-agent Q-learning with Boltzmann exploration is used to learn when to sever or maintain an association. In both the Prisoner s Dilemma and the Stag Hunt games we observe that the Out-for-Tat rewiring rule, breaking ties with other agents choosing socially undesirable actions, becomes dominant, confirming at the same time that cooperation flourishes when rewiring is fast enough relative to imitation. We conducted experiments over a population size of N = 1000 and a total number of iterations of H = 1, 000, 000. |
| Researcher Affiliation | Academia | Chin-wing Leung 1 , Tom Lenaerts 2,3,4 and Paolo Turrini 1 1Department of Computer Science, University of Warwick 2Machine Learning Group, Universit e Libre de Bruxelles 3Artificial Intelligence Lab, Vrije Universiteit Brussel 4Center for Human-Compatible AI, UC Berkeley chin-wing.leung@warwick.ac.uk, tom.lenaerts@ulb.be, p.turrini@warwick.ac.uk |
| Pseudocode | Yes | Algorithm 1 The Co-evolutionary Model |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the availability of its source code. |
| Open Datasets | No | The paper describes a simulation setup ('Consider a network of agents where each agent is randomly connected with z neighbours and assigned an action for the underlying game...'), generating its own data, and does not provide a link or citation to a publicly available dataset. |
| Dataset Splits | No | The paper describes a simulation model where agents learn, but it does not specify explicit training/validation/test dataset splits in the context of data partitioning for model evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms used (e.g., Q-learning) but does not list any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | In line with our baseline model, we conducted experiments over a population size of N = 1000 and a total number of iterations of H = 1, 000, 000. Unless otherwise specified, the average neighbourhood size is z = 30, the learning rate α = 0.05, the inverse temperature for Q-learning τ = 5, and the inverse temperature for imitation β = 0.005. |