reproducibilityindex.ai

Non-Linear Coordination Graphs

Authors: Yipeng Kang, Tonghan Wang, Qianlan Yang, Xiaoran Wu, Chongjie Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the improved representational capacity of our Non-Linear Coordination Graphs (NL-CG) on a matrix game by comparing the learned Q functions to those learned by conventional coordination graphs. We then evaluate our method on the Multi-Agent COordination (MACO) Benchmark [33] for its high requirements on close inter-agent coordination. The experimental results show the superior performance enabled by the non-linear value decomposition.
Researcher Affiliation	Academia	Yipeng Kang Tsinghua University fringsoo@gmail.com Tonghan Wang Harvard University twang1@g.harvard.edu Qianlan Yang UIUC qianlan2@illinois.edu Xiaoran Wu Tsinghua University wuxr17@tsinghua.org.cn Chongjie Zhang Tsinghua University chongjie@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 ENUMERATE-OPTIMIZATION /Show the case for a two-layer mixing network, but can be easily extended to more layers./... Algorithm 2 ITERATIVE-OPTIMIZATION /Show the case for a two-layer mixing network, but can be easily extended to more layers./
Open Source Code	Yes	Code is available at https://github.com/fringsoo/CGMIX
Open Datasets	Yes	We then evaluate our method on the Multi-Agent COordination (MACO) Benchmark [33] for its high requirements on close inter-agent coordination.
Dataset Splits	No	The paper mentions training details like episodes, replay buffer size, and batch size, but does not specify explicit train/validation/test dataset splits for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	For our algorithm, the utility and payoff function is fully connected networks with a single hidden layer of 64 units with a Re LU non-linearity. γ is 0.99, and the replay buffer stores the last 500 episodes, from which we uniformly sample batches of size 32 for training. The target network is updated every 100 episodes. The learning rate of RMSprop is set to 5 × 10−4.