Communication Learning via Backpropagation in Discrete Channels with Unknown Noise

Authors: Benjamin Freed, Guillaume Sartoretti, Jiaheng Hu, Howie Choset7160-7168

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach in two example multi-robot tasks: a path finding and a collaborative search problem. There, we show that our approach achieves learning speed and performance similar to differentiable communication learning with real-valued messages (i.e., unlimited communication bandwidth), while naturally handling more realistic real-world communication constraints.
Researcher Affiliation Academia Benjamin Freed Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213 Guillaume Sartoretti National University of Singapore 21 Lower Kent Ridge Rd Singapore 119077 Jiaheng Hu Columbia University 116th St and Broadway New York, NY 10027 Howie Choset Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213
Pseudocode No The paper describes procedures with diagrams (Fig. 1, 2, 3, 4) but does not provide formal pseudocode or algorithm blocks.
Open Source Code Yes The code for these experiments is available at http://bit.ly/37r7q7y.
Open Datasets No The paper describes custom multi-robot tasks for experiments (Hidden-Goal Path-Finding, Coordinated Multi-Agent Search) but does not refer to publicly available datasets with specific access information, citations, or repository links.
Dataset Splits No The paper describes experimental tasks and conditions but does not specify explicit training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions general software components like 'actor-critic algorithm' and 'convolutional stack' but does not specify particular software packages, libraries, or their version numbers.
Experiment Setup Yes The actor-critic algorithm is used as the reinforcement learning algorithm for all experiments, with separate actor and critic networks. Both actor and critic networks for all tasks are composed of a convolutional stack followed by two fully-connected layers. The policy networks used in the search task also use a simple single-layer recurrent neural network... For a given task, the same architecture is used for both the differentiable and RCL approaches we test... In both tasks, we consider 2 agents that are each able to send the other 40 bits of information at each timestep. ...agents output a real-valued communication signal with 10 elements at each timestep. Each element of this communication signal is then discretized independently into one of 16 discrete message elements, meaning each element can be thought of as a 4-bit word (a nibble), allowing for a total of 16^10 = 2^40 different possible messages, equivalent to 40 bits.