Emergence of Grounded Compositional Language in Multi-Agent Populations

Authors: Igor Mordatch, Pieter Abbeel

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally investigate how variation in goals, environment configuration, and agents physical capabilities lead to different communication strategies. In this work, we consider three types of actions an agent needs to perform: go to location, look at location, and do nothing. Goal for agent i consists of an action to perform, a location to perform it on r, and an agent r that should perform that action. These goal properties are accumulated into goal description vector g.
Researcher Affiliation Collaboration Igor Mordatch Open AI San Francisco, California, USA Pieter Abbeel University of California, Berkeley Berkeley, California, USA
Pseudocode No The paper describes the policy architecture and learning process using prose and diagrams (e.g., Figure 3: Overview of our policy architecture), but it does not include a dedicated section or figure explicitly labeled "Pseudocode" or "Algorithm" with structured steps.
Open Source Code No The paper does not contain any statements indicating that the source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets No The paper describes a custom-built, physically-simulated multi-agent learning environment where data is generated. It does not use or provide access information for a pre-existing, publicly available dataset. "At every optimization iteration, we sample a new batch of 1024 random environment instantiations and backpropagate their dynamics through time to calculate the total return gradient."
Dataset Splits No The paper mentions "Training and test physical reward" in Table 1, but it does not explicitly provide details about a validation split, such as specific percentages or sample counts for training, validation, and test sets. The "new batch of 1024 random environment instantiations" refers to optimization iterations, not a distinct validation set.
Hardware Specification No The paper describes the experimental setup and training procedures but does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper describes the architecture and techniques used (e.g., "exponential-linear units", "dropout"), but it does not list any specific software libraries, frameworks, or tools with their corresponding version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes We build all fully-connected modules with 256 hidden units and 2 layers each in all our experiments, using exponential-linear units and dropout with a rate of 0.1 between all hidden layers. Size is feature vectors φ is 256 and size of each memory module is 32. We use a maximum vocabulary size K = 20 in all our experiments. We did not find it necessary to anneal the temperature and set it to 1 in all our experiments for training and sample directly from the categorical distribution at test time. Δt is the simulation timestep (we use 0.1), and (1 γ) is a damping coefficient (we use 0.5).