Bi-Level Actor-Critic for Multi-Agent Coordination
Authors: Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Weinan Zhang, Jun Wang7325-7332
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment. |
| Researcher Affiliation | Collaboration | 1University College London 2Shanghai Jiao Tong University 3Huawei R&D UK |
| Pseudocode | No | The paper describes update rules using mathematical equations (e.g., Eq. 8-11, 12-16) and textual descriptions, but no explicitly labeled 'Pseudocode' or 'Algorithm' block is provided. |
| Open Source Code | Yes | Our experiments are repeatable and the source code is provided in https://github.com/laonahongchen/Bilevel-Optimization-in-Coordination-Game. |
| Open Datasets | Yes | We used a slightly modified version of the Highway environment (Leurent 2018), in which an agent can observe the kinematics of the nearby agent including its position and velocity... An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env. |
| Dataset Splits | No | The paper describes the training process and exploration strategies but does not specify explicit train/validation/test dataset splits, percentages, or sample counts for any of the environments used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and methods used (e.g., 'Gumbel-Softmax estimator', 'DQN') but does not specify their version numbers or other ancillary software dependencies with versions. |
| Experiment Setup | No | The paper describes general aspects of the experimental setup such as using a 'three layer fully connected neural network with Re LU' and 'decaying ϵ-greedy method for explorations', and mentions 'learning rate αi, β', but does not provide concrete hyperparameter values (e.g., specific learning rates, batch sizes, number of epochs) or other detailed system-level training settings. |