Agent Modelling under Partial Observability for Deep Reinforcement Learning

Authors: Georgios Papoudakis, Filippos Christianos, Stefano Albrecht

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves higher returns than baseline methods which do not use the learned representations.
Researcher Affiliation Academia School of Informatics University of Edinburgh {g.papoudakis, f.christianos, s.albrecht}@ed.ac.uk
Pseudocode Yes The pseudocode of LIAM is given in Appendix A and the implementation details in Appendix D.
Open Source Code Yes We provide an implementation of LIAM in https://github.com/uoe-agents/LIAM
Open Datasets Yes We evaluate the proposed method in three multi-agent environments (one cooperative, one mixed, one competitive): double speaker-listener [Mordatch and Abbeel, 2017], level-based foraging [Albrecht and Stone, 2017, Papoudakis et al., 2021], and a version of predator-prey proposed by [Böhmer et al., 2020].
Dataset Splits No The paper mentions a 'training set Π' and evaluates on policies from Π, but does not provide specific numerical train/validation/test dataset splits (e.g., 80/10/10 split) or refer to external resources that define these splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions general software components and algorithms like A2C, Pytorch, Adam, LSTM, and ReLU, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No The paper mentions general training practices such as using different learning rates for RL and encoder-decoder networks, and averaging over five runs with different initial seeds. However, it does not provide specific hyperparameter values (e.g., learning rate values, batch sizes, or number of epochs) within the main text.