Non-Linear Coordination Graphs
Authors: Yipeng Kang, Tonghan Wang, Qianlan Yang, Xiaoran Wu, Chongjie Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the improved representational capacity of our Non-Linear Coordination Graphs (NL-CG) on a matrix game by comparing the learned Q functions to those learned by conventional coordination graphs. We then evaluate our method on the Multi-Agent COordination (MACO) Benchmark [33] for its high requirements on close inter-agent coordination. The experimental results show the superior performance enabled by the non-linear value decomposition. |
| Researcher Affiliation | Academia | Yipeng Kang Tsinghua University fringsoo@gmail.com Tonghan Wang Harvard University twang1@g.harvard.edu Qianlan Yang UIUC qianlan2@illinois.edu Xiaoran Wu Tsinghua University wuxr17@tsinghua.org.cn Chongjie Zhang Tsinghua University chongjie@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 ENUMERATE-OPTIMIZATION /*Show the case for a two-layer mixing network, but can be easily extended to more layers.*/... Algorithm 2 ITERATIVE-OPTIMIZATION /*Show the case for a two-layer mixing network, but can be easily extended to more layers.*/ |
| Open Source Code | Yes | Code is available at https://github.com/fringsoo/CGMIX |
| Open Datasets | Yes | We then evaluate our method on the Multi-Agent COordination (MACO) Benchmark [33] for its high requirements on close inter-agent coordination. |
| Dataset Splits | No | The paper mentions training details like episodes, replay buffer size, and batch size, but does not specify explicit train/validation/test dataset splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For our algorithm, the utility and payoff function is fully connected networks with a single hidden layer of 64 units with a Re LU non-linearity. γ is 0.99, and the replay buffer stores the last 500 episodes, from which we uniformly sample batches of size 32 for training. The target network is updated every 100 episodes. The learning rate of RMSprop is set to 5 × 10−4. |