reproducibilityindex.ai

To Promote Full Cooperation in Social Dilemmas, Agents Need to Unlearn Loyalty

Authors: Chin-wing Leung, Tom Lenaerts, Paolo Turrini

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Multi-agent Q-learning with Boltzmann exploration is used to learn when to sever or maintain an association. In both the Prisoner s Dilemma and the Stag Hunt games we observe that the Out-for-Tat rewiring rule, breaking ties with other agents choosing socially undesirable actions, becomes dominant, confirming at the same time that cooperation flourishes when rewiring is fast enough relative to imitation. We conducted experiments over a population size of N = 1000 and a total number of iterations of H = 1, 000, 000.
Researcher Affiliation	Academia	Chin-wing Leung 1 , Tom Lenaerts 2,3,4 and Paolo Turrini 1 1Department of Computer Science, University of Warwick 2Machine Learning Group, Universit e Libre de Bruxelles 3Artificial Intelligence Lab, Vrije Universiteit Brussel 4Center for Human-Compatible AI, UC Berkeley chin-wing.leung@warwick.ac.uk, tom.lenaerts@ulb.be, p.turrini@warwick.ac.uk
Pseudocode	Yes	Algorithm 1 The Co-evolutionary Model
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of its source code.
Open Datasets	No	The paper describes a simulation setup ('Consider a network of agents where each agent is randomly connected with z neighbours and assigned an action for the underlying game...'), generating its own data, and does not provide a link or citation to a publicly available dataset.
Dataset Splits	No	The paper describes a simulation model where agents learn, but it does not specify explicit training/validation/test dataset splits in the context of data partitioning for model evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms used (e.g., Q-learning) but does not list any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	In line with our baseline model, we conducted experiments over a population size of N = 1000 and a total number of iterations of H = 1, 000, 000. Unless otherwise specified, the average neighbourhood size is z = 30, the learning rate α = 0.05, the inverse temperature for Q-learning τ = 5, and the inverse temperature for imitation β = 0.005.