Learning Altruistic Behaviours in Reinforcement Learning without External Rewards

Authors: Tim Franzmeyer, Mateusz Malinowski, Joao F. Henriques

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach in three different multi-agent environments where another agent s success depends on altruistic behaviour. Finally, we show that our unsupervised agents can perform comparably to agents explicitly trained to work cooperatively, in some cases even outperforming them.
Researcher Affiliation Collaboration Tim Franzmeyer University of Oxford frtim@robots.ox.ac.uk Mateusz Malinowski Deep Mind mateuszm@google.com Jo ao F. Henriques University of Oxford joao@robots.ox.ac.uk
Pseudocode No The paper provides mathematical formalizations and descriptions of methods, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No We will provide the source code for all experiments conducted with the final version of this publication. We created detailed instructions on how to run the code in order to replicate the experimental outcomes presented in this work.
Open Datasets Yes We use a fully-observable multi-agent environment that enables us to assess the level of cooperation among agents (level-based foraging, LBF, Christianos et al. (2020)) to evaluate the performance of altruistic agents in more complex environments with discrete state spaces. We use a multi-agent tag environment (Tag, Mordatch and Abbeel (2018); Lowe et al. (2017); Terry et al. (2020)), illustrated in Fig. 2 (right), to evaluate the capabilities of altruistic agents in complex environments with continuous state spaces.
Dataset Splits No The paper describes training procedures (e.g., pretraining, freezing policies, training another agent) and evaluations over multiple episodes and random seeds. However, it does not explicitly specify dataset splits (e.g., 80/10/10) for training, validation, and testing of a static dataset.
Hardware Specification Yes All experiments were run on single cores of Intel Xeon E7-8867v3 processors (2.5 GHz).
Software Dependencies No The paper mentions several algorithms and frameworks (e.g., Q-Learning, Deep Q-Learning (DQL), MADDPG, DDPG, TRPO), but it does not specify version numbers for any of the software dependencies or libraries used.
Experiment Setup Yes Appendix B lists all details and parameters. All training parameters can be found in Table 4. Exact setup specifications and all parameters are given in appendix D. Exact hyper-parameters are given in Table 4.