reproducibilityindex.ai

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

Authors: Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner s Dilemma, Volunteer s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes.
Researcher Affiliation	Academia	Elizaveta Tennant1 , Stephen Hailes1 , Mirco Musolesi1,2 1University College London 2University of Bologna {l.karmannaya.16, s.hailes, m.musolesi}@ucl.ac.uk
Pseudocode	No	The paper provides mathematical equations and definitions but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code: https://github.com/Liza-Tennant/moral_choice_dyadic ... Also for this reason, the code used for this study is available as open source software to encourage further work in this area.
Open Datasets	No	The paper models iterated social dilemma games (Prisoner s Dilemma, Volunteer s Dilemma, and Stag Hunt) where agents learn by interacting. These are not pre-existing publicly available datasets with links or citations in the typical sense.
Dataset Splits	No	The paper describes a simulation setup of 10000 iterations per episode, repeated 100 times, but it does not specify training, validation, or test dataset splits in the conventional sense, as it generates interaction data dynamically.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, processors) used to run the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup	Yes	We use a linearly decaying exploration rate ϵ (from 1.0 to 0...), a steady learning rate α = 0.01 (...), and discount factor γ = 0.90. At the start of the game, all Q-values are initialized to 0. Each pair of agents interact in one episode for 10000 iterations of a given social dilemma game. We repeat each episode 100 times, randomizing seeds and initial states at every run.