Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 8. Experimental Results We consider the following setting (taken from (Subramanian and Mahajan, 2019) with slight modifications) for our numerical experiment. ... In Fig. 1a, we demonstrate how the following error changes as a function of N. ... In Fig. 1b, we exhibit that the N-agent and mean-field cost values generated by π are close for large N, and both of them lie below the specified upper bound, ζ.
Researcher Affiliation Academia Washim Uddin Mondal EMAIL Lyles School of Civil Engineering, School of Industrial Engineering, Purdue University, West Lafayette, IN, 47907, USA Vaneet Aggarwal EMAIL School of Industrial Engineering, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47907, USA Satish V. Ukkusuri EMAIL Lyles School of Civil Engineering, Purdue University, West Lafayette, IN, 47907, USA
Pseudocode Yes Algorithm 1 Natural Policy Gradient Algorithm to solve the Dual Problem Input: η1, η2, α: Learning rates, J, L: Number of execution steps w0, Φ0, λ0 = 0: Initial parameters, µ0: Initial state distribution, Constraint bound: ζ 1: for j {0, 1, , J 1} do ... Appendix O. Sampling Procedure Algorithm 2 Sampling Algorithm Input: µ0, πΦj, P, r 1: Sample x0 µ0. 2: Sample u0 πΦj(x0, µ0)
Open Source Code No The text does not contain any explicit statement about releasing source code for the methodology described in this paper, nor does it provide a direct link to a code repository.
Open Datasets No We consider the following setting (taken from (Subramanian and Mahajan, 2019) with slight modifications) for our numerical experiment. ... The initial state distribution, µ0, is taken to be a uniform distribution over Q, and x N 0 is obtained by taking N-independent samples from µ0.
Dataset Splits No The paper describes a numerical experiment in a simulated environment where initial state distributions are generated from a uniform distribution. It does not use a pre-existing dataset with explicit training/test/validation splits.
Hardware Specification Yes The experiments were performed on a 1.8 GHz Dual-Core Intel i5 processor with 8 GB 1600 MHz DDR3 memory.
Software Dependencies No The paper mentions hyperparameters used in Algorithm 1 but does not specify any software libraries or dependencies with version numbers.
Experiment Setup Yes The hyperparameters used in Algorithm 1 are chosen as follows: η1 = η2 = α = 10 3, J = L = 102. The values of different system parameters are given as: αR = 1, βR = 0.5, λR = 0.5, λC = 1, ζ = 5, γ = 0.9, and Q = 10.