TarMAC: Targeted Multi-Agent Communication

Authors: Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Mike Rabbat, Joelle Pineau

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on a diverse set of cooperative multi-agent tasks, of varying difficulties, with varying number of agents, in a variety of environments ranging from 2D grid layouts of shapes and simulated traffic junctions to 3D indoor environments, and demonstrate the benefits of targeted and multiround communication.
Researcher Affiliation Collaboration Abhishek Das 1 Théophile Gervet 2 Joshua Romoff 2 3 Dhruv Batra 1 3 Devi Parikh 1 3 Michael Rabbat 2 3 Joelle Pineau 2 3 1Georgia Tech 2Mc Gill University 3Facebook AI Research.
Pseudocode No The paper describes the architecture and algorithms but does not contain a pseudocode block or algorithm labeled as such.
Open Source Code No No explicit statement or link indicating that the source code for the methodology described in the paper is openly available.
Open Datasets Yes The SHAPES dataset was introduced by Andreas et al. (2016)1, and originally created for testing compositional visual reasoning for the task of visual question answering. It consists of synthetic images of 2D colored shapes arranged in a grid (3 ˆ 3 cells in the original dataset) along with corresponding question-answer pairs. There are 3 shapes (circle, square, triangle), 3 colors (red, green, blue), and 2 sizes (small, big) in total (see Figure 2). 1github.com/jacobandreas/nmn2/tree/shapes
Dataset Splits No The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It mentions environments like SHAPES, Traffic Junction, and House3D, but without detailing how data within these environments was partitioned for different phases.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only implies computations were done, without specifying the machines.
Software Dependencies No The paper mentions 'RMSProp' as an optimizer but does not provide specific version numbers for any software, libraries, or programming languages used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All our models were trained with a batched synchronous version of the multi-agent Actor-Critic described above, using RMSProp with a learning rate of 7 ˆ 10 4 and α 0.99, batch size 16, discount factor γ 0.99 and entropy regularization coefficient 0.01 for agent policies. All our agent policies are instantiated from the same set of shared parameters; i.e. θ1 ... θN. Each agent s GRU hidden state is 128-d, message signature/query is 16-d, and message value is 32-d (unless specified otherwise).