Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming

Authors: Sachin G Konan, Esmaeil Seraj, Matthew Gombolay

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments validate the utility of Info PG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains.
Researcher Affiliation Academia Sachin Konan , Esmaeil Seraj , Matthew Gombolay Georgia Institute of Technology Atlanta, GA 30332, USA {skonan, eseraj3}@gatech.edu, matthew.gombolay@cc.gatech.edu
Pseudocode Yes Please refer to Appendix, Section A.1 for pseudocode and details of our training and execution procedures. [...] Algorithm 1: Training the Mutual Information Maximizing Policy Gradient (Info PG).
Open Source Code Yes We also publicized our source code in a public repository, available online at https://github.com/CORE-Robotics-Lab/Info PG.
Open Datasets Yes Our testing environments include: (1) Cooperative Pong (Co-op Pong) (Terry et al., 2020), (2) Pistonball (Terry et al., 2020), (3) Multiwalker (Gupta et al., 2017; Terry et al., 2020) and, (4) Star Craft II (Vinyals et al., 2017), i.e., the 3M (three marines vs. three marines) challenge. [...] Domains are parts of the Petting Zoo (Terry et al., 2020) MARL research library and can be accessed online at https://www.pettingzoo.ml/envs. The Star Craft II (Vinyals et al., 2017), can be accessed from Deepmind s repository available online at https://github.com/deepmind/pysc2.
Dataset Splits No The paper discusses training and testing, but does not explicitly mention using a separate validation set or specific training/validation/test splits with percentages or counts.
Hardware Specification Yes Hardware Specifics All experiments were conducted on an NVIDIA Quadro RTX 8000 with approximately 50 GB of Video Memory Capacity.
Software Dependencies No The paper discusses the use of Alex Net and specific RNN types (GRU, LSTM, VRNN) but does not provide version numbers for any software dependencies.
Experiment Setup Yes Additionally, we have provided the details of our implementations for training and execution as well as the full hyperparameter lists for all methods, baselines, and experiments in the Appendix, Section A.9. [...] Tables 2-6 provide detailed hyperparameters such as Learning Rate, Batch Size, Discount Factor, etc.