Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
Authors: Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using variety of tasks including Star Craft Brood Wars TM explore and combat scenarios, we show that our network yields improved performance and convergence rates than the baselines as the scale increases. Our results convey that IC3Net agents learn when to communicate based on the scenario and profitability. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Amanpreet Singh New York University Facebook AI Research amanpreet@nyu.edu Tushar Jain New York University tushar@nyu.edu Sainbayar Sukhbaatar New York University Facebook AI Research sainbar@cs.nyu.edu |
| Pseudocode | No | The paper describes the model (IC3Net) using text and mathematical equations in Section 3 but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is available at https://github.com/IC3Net/IC3Net. |
| Open Datasets | Yes | We consider three environments for our analysis and experiments. (i) a predator-prey environment (PP)... (ii) a traffic junction environment (TJ) similar to Sukhbaatar et al. (2016)... (iii) Star Craft Brood Wars (SC) explore and combat tasks... We implement our model using Py Torch and environments using Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper specifies training epochs ('1000 epochs', '2000 epochs') and discusses the 'final results' of experiments, but it does not explicitly mention or detail a specific dataset split for validation (e.g., as a percentage or sample count). |
| Hardware Specification | No | The paper mentions that 'The training is distributed over 16 cores and each core runs a mini-batch till total episodes steps are 500 or more,' but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | We implement our model using Py Torch and environments using Gym (Brockman et al., 2016). We use RMSProp (Tieleman & Hinton, 2012) with initial learning rate as a tuned hyper-parameter. |
| Experiment Setup | Yes | We set the hidden layer size to 128 units and we use LSTM (Hochreiter & Schmidhuber, 1997) with recurrence for all of the baselines and IC3Net. We use RMSProp (Tieleman & Hinton, 2012) with initial learning rate as a tuned hyper-parameter. All of the models use skip-connections (He et al., 2016). The training is distributed over 16 cores and each core runs a mini-batch till total episodes steps are 500 or more. We do 10 weight updates per epoch. We run predator-prey, Star Craft experiments for 1000 epochs, traffic junction experiment for 2000 epochs and report the final results. We utilized curriculum learning Bengio et al. (2009) to make the training process easier... The learning rate is fixed at 0.003 throughout. |