Emergence of Grounded Compositional Language in Multi-Agent Populations
Authors: Igor Mordatch, Pieter Abbeel
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally investigate how variation in goals, environment configuration, and agents physical capabilities lead to different communication strategies. In this work, we consider three types of actions an agent needs to perform: go to location, look at location, and do nothing. Goal for agent i consists of an action to perform, a location to perform it on r, and an agent r that should perform that action. These goal properties are accumulated into goal description vector g. |
| Researcher Affiliation | Collaboration | Igor Mordatch Open AI San Francisco, California, USA Pieter Abbeel University of California, Berkeley Berkeley, California, USA |
| Pseudocode | No | The paper describes the policy architecture and learning process using prose and diagrams (e.g., Figure 3: Overview of our policy architecture), but it does not include a dedicated section or figure explicitly labeled "Pseudocode" or "Algorithm" with structured steps. |
| Open Source Code | No | The paper does not contain any statements indicating that the source code for the described methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper describes a custom-built, physically-simulated multi-agent learning environment where data is generated. It does not use or provide access information for a pre-existing, publicly available dataset. "At every optimization iteration, we sample a new batch of 1024 random environment instantiations and backpropagate their dynamics through time to calculate the total return gradient." |
| Dataset Splits | No | The paper mentions "Training and test physical reward" in Table 1, but it does not explicitly provide details about a validation split, such as specific percentages or sample counts for training, validation, and test sets. The "new batch of 1024 random environment instantiations" refers to optimization iterations, not a distinct validation set. |
| Hardware Specification | No | The paper describes the experimental setup and training procedures but does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper describes the architecture and techniques used (e.g., "exponential-linear units", "dropout"), but it does not list any specific software libraries, frameworks, or tools with their corresponding version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We build all fully-connected modules with 256 hidden units and 2 layers each in all our experiments, using exponential-linear units and dropout with a rate of 0.1 between all hidden layers. Size is feature vectors φ is 256 and size of each memory module is 32. We use a maximum vocabulary size K = 20 in all our experiments. We did not find it necessary to anneal the temperature and set it to 1 in all our experiments for training and sample directly from the categorical distribution at test time. Δt is the simulation timestep (we use 0.1), and (1 γ) is a damping coefficient (we use 0.5). |