reproducibilityindex.ai

Bi-Level Actor-Critic for Multi-Agent Coordination

Authors: Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Weinan Zhang, Jun Wang7325-7332

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and ﬁnd a asymmetric solution in a highway merge environment.
Researcher Affiliation	Collaboration	1University College London 2Shanghai Jiao Tong University 3Huawei R&D UK
Pseudocode	No	The paper describes update rules using mathematical equations (e.g., Eq. 8-11, 12-16) and textual descriptions, but no explicitly labeled 'Pseudocode' or 'Algorithm' block is provided.
Open Source Code	Yes	Our experiments are repeatable and the source code is provided in https://github.com/laonahongchen/Bilevel-Optimization-in-Coordination-Game.
Open Datasets	Yes	We used a slightly modiﬁed version of the Highway environment (Leurent 2018), in which an agent can observe the kinematics of the nearby agent including its position and velocity... An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env.
Dataset Splits	No	The paper describes the training process and exploration strategies but does not specify explicit train/validation/test dataset splits, percentages, or sample counts for any of the environments used.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments.
Software Dependencies	No	The paper mentions software components and methods used (e.g., 'Gumbel-Softmax estimator', 'DQN') but does not specify their version numbers or other ancillary software dependencies with versions.
Experiment Setup	No	The paper describes general aspects of the experimental setup such as using a 'three layer fully connected neural network with Re LU' and 'decaying ϵ-greedy method for explorations', and mentions 'learning rate αi, β', but does not provide concrete hyperparameter values (e.g., specific learning rates, batch sizes, number of epochs) or other detailed system-level training settings.