Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Actor-Critic learning for mean-field control in continuous time

Authors: Noufel FRIKHA, Maximilien GERMAIN, Mathieu LAURIERE, Huyen PHAM, Xuanye SONG

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate the results of our algorithms with some numerical experiments on concrete examples. Keywords: Mean-field control, reinforcement learning, policy gradient, linear-quadratic, actor-critic algorithms
Researcher Affiliation Collaboration Noufel FRIKHA Noufel.Frikha at univ-paris1.fr CES, UMR 8174, Universit e Paris 1 Panth eon Sorbonne Maximilien GERMAIN maximilien.germain at gmail.com Morgan Stanley Mathieu LAURIERE mathieu.lauriere at nyu.edu NYU Shanghai Huyˆen PHAM huyen.pham at polytechnique.edu CMAP, Ecole Polytechnique, Xuanye SONG xuanye.song at ntu.edu.sg NTU, Singapour
Pseudocode Yes Algorithm 1: Offline actor-critic mean-field algorithm Algorithm 2: Online actor-critic mean-field algorithm
Open Source Code No The text is ambiguous or lacks a clear, affirmative statement of release.
Open Datasets No We implement our actor-critic algorithms with a simulator of X for coefficients equal to T = 1, γ = 1, B = B = 0.6, I = 0.4, P = Q = 1, ... The simulator for X is based on the real mean-field model:
Dataset Splits No for each episode i = 1, . . . , N do ... We simulate 10 populations, each consisting of 104 agents.
Hardware Specification No No hardware is mentioned
Software Dependencies No We use neural networks with 3 hidden layers, 10 neurons per layer and tanh activation functions. ... The derivatives w.r.t. to η of Kη, Rη, hence of Jη, as well as the derivative w.r.t. θ of log pθ are computed by automatic differentiation.
Experiment Setup Yes Here we used the following parameters: µtk was initialized at 0; the number of episodes was N = 2100; the time horizon was T = 1 and the time step t = 0.02. The values of the model parameters were as described above. The learning rates (ρS, ρE, ρG) and λ were taken as ρS = 0.2 constant, and at iteration i, ( (0.01, 0.1, 0.01, 0.2) if i 500 (0.1, 0.1, 0.1, 0.1) if 500 < i 21000 ρG(i) = (0.03, 0.05, 0.03) if i 7000 (0.01, 0.01, 0.01) if 7000 < i 10000 (0.005, 0.01, 0.005) if 10000 < i 14000 (0.002, 0.002, 0.002) if 17000 < i 21000 0.1 if i 8000, 0.01 if 8000 < i 14000, 0.001 if 14000 < i 21000 Moreover, after i = 14000 iterations, we also increase the size of the minibatch from 20 to 40.