Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Actor-Critic learning for mean-field control in continuous time
Authors: Noufel FRIKHA, Maximilien GERMAIN, Mathieu LAURIERE, Huyen PHAM, Xuanye SONG
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we illustrate the results of our algorithms with some numerical experiments on concrete examples. Keywords: Mean-field control, reinforcement learning, policy gradient, linear-quadratic, actor-critic algorithms |
| Researcher Affiliation | Collaboration | Noufel FRIKHA Noufel.Frikha at univ-paris1.fr CES, UMR 8174, Universit e Paris 1 Panth eon Sorbonne Maximilien GERMAIN maximilien.germain at gmail.com Morgan Stanley Mathieu LAURIERE mathieu.lauriere at nyu.edu NYU Shanghai Huyˆen PHAM huyen.pham at polytechnique.edu CMAP, Ecole Polytechnique, Xuanye SONG xuanye.song at ntu.edu.sg NTU, Singapour |
| Pseudocode | Yes | Algorithm 1: Offline actor-critic mean-field algorithm Algorithm 2: Online actor-critic mean-field algorithm |
| Open Source Code | No | The text is ambiguous or lacks a clear, affirmative statement of release. |
| Open Datasets | No | We implement our actor-critic algorithms with a simulator of X for coefficients equal to T = 1, γ = 1, B = B = 0.6, I = 0.4, P = Q = 1, ... The simulator for X is based on the real mean-field model: |
| Dataset Splits | No | for each episode i = 1, . . . , N do ... We simulate 10 populations, each consisting of 104 agents. |
| Hardware Specification | No | No hardware is mentioned |
| Software Dependencies | No | We use neural networks with 3 hidden layers, 10 neurons per layer and tanh activation functions. ... The derivatives w.r.t. to η of Kη, Rη, hence of Jη, as well as the derivative w.r.t. θ of log pθ are computed by automatic differentiation. |
| Experiment Setup | Yes | Here we used the following parameters: µtk was initialized at 0; the number of episodes was N = 2100; the time horizon was T = 1 and the time step t = 0.02. The values of the model parameters were as described above. The learning rates (ρS, ρE, ρG) and λ were taken as ρS = 0.2 constant, and at iteration i, ( (0.01, 0.1, 0.01, 0.2) if i 500 (0.1, 0.1, 0.1, 0.1) if 500 < i 21000 ρG(i) = (0.03, 0.05, 0.03) if i 7000 (0.01, 0.01, 0.01) if 7000 < i 10000 (0.005, 0.01, 0.005) if 10000 < i 14000 (0.002, 0.002, 0.002) if 17000 < i 21000 0.1 if i 8000, 0.01 if 8000 < i 14000, 0.001 if 14000 < i 21000 Moreover, after i = 14000 iterations, we also increase the size of the minibatch from 20 to 40. |