Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)
Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 8. Experimental Results We consider the following setting (taken from (Subramanian and Mahajan, 2019) with slight modifications) for our numerical experiment. ... In Fig. 1a, we demonstrate how the following error changes as a function of N. ... In Fig. 1b, we exhibit that the N-agent and mean-field cost values generated by π are close for large N, and both of them lie below the specified upper bound, ζ. |
| Researcher Affiliation | Academia | Washim Uddin Mondal EMAIL Lyles School of Civil Engineering, School of Industrial Engineering, Purdue University, West Lafayette, IN, 47907, USA Vaneet Aggarwal EMAIL School of Industrial Engineering, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47907, USA Satish V. Ukkusuri EMAIL Lyles School of Civil Engineering, Purdue University, West Lafayette, IN, 47907, USA |
| Pseudocode | Yes | Algorithm 1 Natural Policy Gradient Algorithm to solve the Dual Problem Input: η1, η2, α: Learning rates, J, L: Number of execution steps w0, Φ0, λ0 = 0: Initial parameters, µ0: Initial state distribution, Constraint bound: ζ 1: for j {0, 1, , J 1} do ... Appendix O. Sampling Procedure Algorithm 2 Sampling Algorithm Input: µ0, πΦj, P, r 1: Sample x0 µ0. 2: Sample u0 πΦj(x0, µ0) |
| Open Source Code | No | The text does not contain any explicit statement about releasing source code for the methodology described in this paper, nor does it provide a direct link to a code repository. |
| Open Datasets | No | We consider the following setting (taken from (Subramanian and Mahajan, 2019) with slight modifications) for our numerical experiment. ... The initial state distribution, µ0, is taken to be a uniform distribution over Q, and x N 0 is obtained by taking N-independent samples from µ0. |
| Dataset Splits | No | The paper describes a numerical experiment in a simulated environment where initial state distributions are generated from a uniform distribution. It does not use a pre-existing dataset with explicit training/test/validation splits. |
| Hardware Specification | Yes | The experiments were performed on a 1.8 GHz Dual-Core Intel i5 processor with 8 GB 1600 MHz DDR3 memory. |
| Software Dependencies | No | The paper mentions hyperparameters used in Algorithm 1 but does not specify any software libraries or dependencies with version numbers. |
| Experiment Setup | Yes | The hyperparameters used in Algorithm 1 are chosen as follows: η1 = η2 = α = 10 3, J = L = 102. The values of different system parameters are given as: αR = 1, βR = 0.5, λR = 0.5, λC = 1, ζ = 5, γ = 0.9, and Q = 10. |