Learning Altruistic Behaviours in Reinforcement Learning without External Rewards
Authors: Tim Franzmeyer, Mateusz Malinowski, Joao F. Henriques
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach in three different multi-agent environments where another agent s success depends on altruistic behaviour. Finally, we show that our unsupervised agents can perform comparably to agents explicitly trained to work cooperatively, in some cases even outperforming them. |
| Researcher Affiliation | Collaboration | Tim Franzmeyer University of Oxford frtim@robots.ox.ac.uk Mateusz Malinowski Deep Mind mateuszm@google.com Jo ao F. Henriques University of Oxford joao@robots.ox.ac.uk |
| Pseudocode | No | The paper provides mathematical formalizations and descriptions of methods, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will provide the source code for all experiments conducted with the final version of this publication. We created detailed instructions on how to run the code in order to replicate the experimental outcomes presented in this work. |
| Open Datasets | Yes | We use a fully-observable multi-agent environment that enables us to assess the level of cooperation among agents (level-based foraging, LBF, Christianos et al. (2020)) to evaluate the performance of altruistic agents in more complex environments with discrete state spaces. We use a multi-agent tag environment (Tag, Mordatch and Abbeel (2018); Lowe et al. (2017); Terry et al. (2020)), illustrated in Fig. 2 (right), to evaluate the capabilities of altruistic agents in complex environments with continuous state spaces. |
| Dataset Splits | No | The paper describes training procedures (e.g., pretraining, freezing policies, training another agent) and evaluations over multiple episodes and random seeds. However, it does not explicitly specify dataset splits (e.g., 80/10/10) for training, validation, and testing of a static dataset. |
| Hardware Specification | Yes | All experiments were run on single cores of Intel Xeon E7-8867v3 processors (2.5 GHz). |
| Software Dependencies | No | The paper mentions several algorithms and frameworks (e.g., Q-Learning, Deep Q-Learning (DQL), MADDPG, DDPG, TRPO), but it does not specify version numbers for any of the software dependencies or libraries used. |
| Experiment Setup | Yes | Appendix B lists all details and parameters. All training parameters can be found in Table 4. Exact setup specifications and all parameters are given in appendix D. Exact hyper-parameters are given in Table 4. |