Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming
Authors: Sachin G Konan, Esmaeil Seraj, Matthew Gombolay
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments validate the utility of Info PG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains. |
| Researcher Affiliation | Academia | Sachin Konan , Esmaeil Seraj , Matthew Gombolay Georgia Institute of Technology Atlanta, GA 30332, USA {skonan, eseraj3}@gatech.edu, matthew.gombolay@cc.gatech.edu |
| Pseudocode | Yes | Please refer to Appendix, Section A.1 for pseudocode and details of our training and execution procedures. [...] Algorithm 1: Training the Mutual Information Maximizing Policy Gradient (Info PG). |
| Open Source Code | Yes | We also publicized our source code in a public repository, available online at https://github.com/CORE-Robotics-Lab/Info PG. |
| Open Datasets | Yes | Our testing environments include: (1) Cooperative Pong (Co-op Pong) (Terry et al., 2020), (2) Pistonball (Terry et al., 2020), (3) Multiwalker (Gupta et al., 2017; Terry et al., 2020) and, (4) Star Craft II (Vinyals et al., 2017), i.e., the 3M (three marines vs. three marines) challenge. [...] Domains are parts of the Petting Zoo (Terry et al., 2020) MARL research library and can be accessed online at https://www.pettingzoo.ml/envs. The Star Craft II (Vinyals et al., 2017), can be accessed from Deepmind s repository available online at https://github.com/deepmind/pysc2. |
| Dataset Splits | No | The paper discusses training and testing, but does not explicitly mention using a separate validation set or specific training/validation/test splits with percentages or counts. |
| Hardware Specification | Yes | Hardware Specifics All experiments were conducted on an NVIDIA Quadro RTX 8000 with approximately 50 GB of Video Memory Capacity. |
| Software Dependencies | No | The paper discusses the use of Alex Net and specific RNN types (GRU, LSTM, VRNN) but does not provide version numbers for any software dependencies. |
| Experiment Setup | Yes | Additionally, we have provided the details of our implementations for training and execution as well as the full hyperparameter lists for all methods, baselines, and experiments in the Appendix, Section A.9. [...] Tables 2-6 provide detailed hyperparameters such as Learning Rate, Batch Size, Discount Factor, etc. |