Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Authors: Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In 4 we illustrate the empirical performance of the proposed algorithm. [...] To verify the performance of our proposed method, we conduct an experiment on the mountaincar dataset [46] under a setting similar to [15] to collect the dataset, we ran Sarsa with d = 300 features to obtain the policy, then we generate the trajectories of actions and states according to the policy with M samples. |
| Researcher Affiliation | Academia | Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong htwai@se.cuhk.edu.hk Zhuoran Yang Princeton University Princeton, NJ, USA zy6@princeton.edu Zhaoran Wang Northwestern University Evanston, IL, USA zhaoranwang@gmail.com Mingyi Hong University of Minnesota Minneapolis, MN, USA mhong@umn.edu |
| Pseudocode | Yes | Algorithm 1 PD-Dist IAG Method for Multi-agent, Primal-dual, Finite-sum Optimization |
| Open Source Code | No | The paper does not contain any concrete access information (e.g., a specific repository link, an explicit code release statement, or mention of code in supplementary materials) for the source code of the methodology. |
| Open Datasets | Yes | To verify the performance of our proposed method, we conduct an experiment on the mountaincar dataset [46] under a setting similar to [15] to collect the dataset |
| Dataset Splits | No | The paper mentions 'M = 5000 samples' but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts for each split, or reference to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Sarsa and comparing with PDBG, GTD2, and SAGA, but it does not specify version numbers for any software dependencies or libraries required to replicate the experiments. |
| Experiment Setup | Yes | For PD-Dist IAG, we simulate a communication network with N = 10 agents, connected on an Erdos-Renyi graph generated with connectivity of 0.2; for the step sizes, we set γ1 = 0.005/λmax( ˆ A), γ2 = 5 10 3. For this problem, we have d = 300, M = 5000 samples, and there are N = 10 agents. |