Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Actor-Critic learning for mean-field control in continuous time

Authors: Noufel FRIKHA, Maximilien GERMAIN, Mathieu LAURIERE, Huyen PHAM, Xuanye SONG

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we illustrate the results of our algorithms with some numerical experiments on concrete examples. Keywords: Mean-ﬁeld control, reinforcement learning, policy gradient, linear-quadratic, actor-critic algorithms
Researcher Affiliation	Collaboration	Noufel FRIKHA Noufel.Frikha at univ-paris1.fr CES, UMR 8174, Universit e Paris 1 Panth eon Sorbonne Maximilien GERMAIN maximilien.germain at gmail.com Morgan Stanley Mathieu LAURIERE mathieu.lauriere at nyu.edu NYU Shanghai Huyˆen PHAM huyen.pham at polytechnique.edu CMAP, Ecole Polytechnique, Xuanye SONG xuanye.song at ntu.edu.sg NTU, Singapour
Pseudocode	Yes	Algorithm 1: Oﬀline actor-critic mean-ﬁeld algorithm Algorithm 2: Online actor-critic mean-ﬁeld algorithm
Open Source Code	No	The text is ambiguous or lacks a clear, affirmative statement of release.
Open Datasets	No	We implement our actor-critic algorithms with a simulator of X for coeﬃcients equal to T = 1, γ = 1, B = B = 0.6, I = 0.4, P = Q = 1, ... The simulator for X is based on the real mean-ﬁeld model:
Dataset Splits	No	for each episode i = 1, . . . , N do ... We simulate 10 populations, each consisting of 104 agents.
Hardware Specification	No	No hardware is mentioned
Software Dependencies	No	We use neural networks with 3 hidden layers, 10 neurons per layer and tanh activation functions. ... The derivatives w.r.t. to η of Kη, Rη, hence of Jη, as well as the derivative w.r.t. θ of log pθ are computed by automatic diﬀerentiation.
Experiment Setup	Yes	Here we used the following parameters: µtk was initialized at 0; the number of episodes was N = 2100; the time horizon was T = 1 and the time step t = 0.02. The values of the model parameters were as described above. The learning rates (ρS, ρE, ρG) and λ were taken as ρS = 0.2 constant, and at iteration i, ( (0.01, 0.1, 0.01, 0.2) if i 500 (0.1, 0.1, 0.1, 0.1) if 500 < i 21000 ρG(i) = (0.03, 0.05, 0.03) if i 7000 (0.01, 0.01, 0.01) if 7000 < i 10000 (0.005, 0.01, 0.005) if 10000 < i 14000 (0.002, 0.002, 0.002) if 17000 < i 21000 0.1 if i 8000, 0.01 if 8000 < i 14000, 0.001 if 14000 < i 21000 Moreover, after i = 14000 iterations, we also increase the size of the minibatch from 20 to 40.