reproducibilityindex.ai

Learning Altruistic Behaviours in Reinforcement Learning without External Rewards

Authors: Tim Franzmeyer, Mateusz Malinowski, Joao F. Henriques

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach in three different multi-agent environments where another agent s success depends on altruistic behaviour. Finally, we show that our unsupervised agents can perform comparably to agents explicitly trained to work cooperatively, in some cases even outperforming them.
Researcher Affiliation	Collaboration	Tim Franzmeyer University of Oxford frtim@robots.ox.ac.uk Mateusz Malinowski Deep Mind mateuszm@google.com Jo ao F. Henriques University of Oxford joao@robots.ox.ac.uk
Pseudocode	No	The paper provides mathematical formalizations and descriptions of methods, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will provide the source code for all experiments conducted with the ﬁnal version of this publication. We created detailed instructions on how to run the code in order to replicate the experimental outcomes presented in this work.
Open Datasets	Yes	We use a fully-observable multi-agent environment that enables us to assess the level of cooperation among agents (level-based foraging, LBF, Christianos et al. (2020)) to evaluate the performance of altruistic agents in more complex environments with discrete state spaces. We use a multi-agent tag environment (Tag, Mordatch and Abbeel (2018); Lowe et al. (2017); Terry et al. (2020)), illustrated in Fig. 2 (right), to evaluate the capabilities of altruistic agents in complex environments with continuous state spaces.
Dataset Splits	No	The paper describes training procedures (e.g., pretraining, freezing policies, training another agent) and evaluations over multiple episodes and random seeds. However, it does not explicitly specify dataset splits (e.g., 80/10/10) for training, validation, and testing of a static dataset.
Hardware Specification	Yes	All experiments were run on single cores of Intel Xeon E7-8867v3 processors (2.5 GHz).
Software Dependencies	No	The paper mentions several algorithms and frameworks (e.g., Q-Learning, Deep Q-Learning (DQL), MADDPG, DDPG, TRPO), but it does not specify version numbers for any of the software dependencies or libraries used.
Experiment Setup	Yes	Appendix B lists all details and parameters. All training parameters can be found in Table 4. Exact setup speciﬁcations and all parameters are given in appendix D. Exact hyper-parameters are given in Table 4.