Learning Attentional Communication for Multi-Agent Cooperation

Authors: Jiechuan Jiang, Zongqing Lu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show the strength of our model in a variety of cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies than existing methods. We implement ATOC as an extension of actor-critic model, which is trained end-to-end by backpropagation. We empirically show the success of ATOC in three scenarios, which correspond to the cooperation of agents for local reward, a shared global reward, and reward in competition, respectively. Experiments are performed based the multi-agent particle environment [14, 18], which is a two-dimensional world with continuous space and discrete time, consisting agents and landmarks. We compare ATOC with Comm Net, Bi CNet and DDPG. We trained ATOC and the baselines with the settings of N = 50 and L = 50...
Researcher Affiliation Academia Jiechuan Jiang Peking University jiechuan.jiang@pku.edu.cn Zongqing Lu Peking University zongqing.lu@pku.edu.cn
Pseudocode No The paper describes the model architecture and training process in prose and equations, but it does not include a pseudocode block or an algorithm box.
Open Source Code No The paper does not contain any statements about releasing source code or links to a code repository.
Open Datasets Yes Experiments are performed based the multi-agent particle environment [14, 18], which is a two-dimensional world with continuous space and discrete time, consisting agents and landmarks.
Dataset Splits No The paper mentions using a 'replay buffer' and minibatch training, but it does not specify explicit training, validation, and testing dataset splits (e.g., percentages or counts for each split).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cluster specifications) used for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'Re LU and batch normalization' for neural networks, but it does not specify version numbers for any programming languages, libraries, or other software components.
Experiment Setup Yes In all the experiments, we use Adam optimizer with a learning rate of 0.001. The discount factor of reward γ is 0.96. For the soft update of target networks, we use τ = 0.001. The actor network has four hidden layers, the second layer is the thought (128 units), and the output layer is the tanh activation function. The critic network has two hidden layers with 512 and 256 units respectively. For communication, T is 15. The capacity of the replay buffer is 10^5 and every time we take a minibatch of 2560. We accumulate experiences in the first thirty episodes before training. As DDPG, we use an Ornstein-Uhlenbeck process with Θ = 0.15 and σ = 0.2 for the exploration noise process.