Non-Linear Coordination Graphs

Authors: Yipeng Kang, Tonghan Wang, Qianlan Yang, Xiaoran Wu, Chongjie Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the improved representational capacity of our Non-Linear Coordination Graphs (NL-CG) on a matrix game by comparing the learned Q functions to those learned by conventional coordination graphs. We then evaluate our method on the Multi-Agent COordination (MACO) Benchmark [33] for its high requirements on close inter-agent coordination. The experimental results show the superior performance enabled by the non-linear value decomposition.
Researcher Affiliation Academia Yipeng Kang Tsinghua University fringsoo@gmail.com Tonghan Wang Harvard University twang1@g.harvard.edu Qianlan Yang UIUC qianlan2@illinois.edu Xiaoran Wu Tsinghua University wuxr17@tsinghua.org.cn Chongjie Zhang Tsinghua University chongjie@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 ENUMERATE-OPTIMIZATION /*Show the case for a two-layer mixing network, but can be easily extended to more layers.*/... Algorithm 2 ITERATIVE-OPTIMIZATION /*Show the case for a two-layer mixing network, but can be easily extended to more layers.*/
Open Source Code Yes Code is available at https://github.com/fringsoo/CGMIX
Open Datasets Yes We then evaluate our method on the Multi-Agent COordination (MACO) Benchmark [33] for its high requirements on close inter-agent coordination.
Dataset Splits No The paper mentions training details like episodes, replay buffer size, and batch size, but does not specify explicit train/validation/test dataset splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes For our algorithm, the utility and payoff function is fully connected networks with a single hidden layer of 64 units with a Re LU non-linearity. γ is 0.99, and the replay buffer stores the last 500 episodes, from which we uniformly sample batches of size 32 for training. The target network is updated every 100 episodes. The learning rate of RMSprop is set to 5 × 10−4.