Neurosymbolic Transformers for Multi-Agent Communication

Authors: Jeevana Priya Inala, Yichen Yang, James Paulos, Yewen Pu, Osbert Bastani, Vijay Kumar, Martin Rinard, Armando Solar-Lezama

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.
Researcher Affiliation Academia 1 MIT CSAIL 2 University of Pennsylvania
Pseudocode No The paper describes the synthesis algorithm in prose but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code and a video illustrating the different tasks are available at https://github.com/jinala/multi-agent-neurosym-transformers.
Open Datasets No The paper describes custom simulation environments ('Formation task', 'Unlabeled goals task') but does not provide access information or citations for a publicly available or open dataset.
Dataset Splits No The paper mentions training with '10k rollouts' and building a dataset for synthesis using '300 rollouts' but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes For all approaches, we train the model with 10k rollouts. For synthesizing the programmatic policy, we build a dataset using 300 rollouts and run MCMC for 10000 steps. We retrain the transformer with 1000 rollouts. We constrain the maximum in-degree to be a constant d0 across all approaches (except tf-full, where each agent communicates with every other agent); for dist and hard-attn, we do so by setting the communication neighbors to be k = d0, and for prog and prog-retrain, we choose the number of rules to be K = d0.