Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Authors: Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In 4 we illustrate the empirical performance of the proposed algorithm. [...] To verify the performance of our proposed method, we conduct an experiment on the mountaincar dataset [46] under a setting similar to [15] to collect the dataset, we ran Sarsa with d = 300 features to obtain the policy, then we generate the trajectories of actions and states according to the policy with M samples.
Researcher Affiliation Academia Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong htwai@se.cuhk.edu.hk Zhuoran Yang Princeton University Princeton, NJ, USA zy6@princeton.edu Zhaoran Wang Northwestern University Evanston, IL, USA zhaoranwang@gmail.com Mingyi Hong University of Minnesota Minneapolis, MN, USA mhong@umn.edu
Pseudocode Yes Algorithm 1 PD-Dist IAG Method for Multi-agent, Primal-dual, Finite-sum Optimization
Open Source Code No The paper does not contain any concrete access information (e.g., a specific repository link, an explicit code release statement, or mention of code in supplementary materials) for the source code of the methodology.
Open Datasets Yes To verify the performance of our proposed method, we conduct an experiment on the mountaincar dataset [46] under a setting similar to [15] to collect the dataset
Dataset Splits No The paper mentions 'M = 5000 samples' but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts for each split, or reference to predefined splits).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using Sarsa and comparing with PDBG, GTD2, and SAGA, but it does not specify version numbers for any software dependencies or libraries required to replicate the experiments.
Experiment Setup Yes For PD-Dist IAG, we simulate a communication network with N = 10 agents, connected on an Erdos-Renyi graph generated with connectivity of 0.2; for the step sizes, we set γ1 = 0.005/λmax( ˆ A), γ2 = 5 10 3. For this problem, we have d = 300, M = 5000 samples, and there are N = 10 agents.