Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
Authors: Elizaveta Tennant, Stephen Hailes, Mirco Musolesi
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner s Dilemma, Volunteer s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. |
| Researcher Affiliation | Academia | Elizaveta Tennant1 , Stephen Hailes1 , Mirco Musolesi1,2 1University College London 2University of Bologna {l.karmannaya.16, s.hailes, m.musolesi}@ucl.ac.uk |
| Pseudocode | No | The paper provides mathematical equations and definitions but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code: https://github.com/Liza-Tennant/moral_choice_dyadic ... Also for this reason, the code used for this study is available as open source software to encourage further work in this area. |
| Open Datasets | No | The paper models iterated social dilemma games (Prisoner s Dilemma, Volunteer s Dilemma, and Stag Hunt) where agents learn by interacting. These are not pre-existing publicly available datasets with links or citations in the typical sense. |
| Dataset Splits | No | The paper describes a simulation setup of 10000 iterations per episode, repeated 100 times, but it does not specify training, validation, or test dataset splits in the conventional sense, as it generates interaction data dynamically. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, processors) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | We use a linearly decaying exploration rate ϵ (from 1.0 to 0...), a steady learning rate α = 0.01 (...), and discount factor γ = 0.90. At the start of the game, all Q-values are initialized to 0. Each pair of agents interact in one episode for 10000 iterations of a given social dilemma game. We repeat each episode 100 times, randomizing seeds and initial states at every run. |