reproducibilityindex.ai

Mean Field Games Flock! The Reinforcement Learning Way

Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show numerically that our algorithm can learn multi-group or high-dimensional ﬂocking with obstacles. Our main contributions are: (1) we cast the ﬂocking problem into a MFG and propose variations which allow multigroup ﬂocking as well as ﬂocking in high dimension with complex topologies, (2) we introduce the Flock n RL algorithm that builds upon the Fictitious Play paradigm and involves deep neural networks and RL to solve the model-free ﬂocking MFG, and (3) we illustrate our approach on several numerical examples and evaluate the solution with approximate performance matrix and exploitability.
Researcher Affiliation	Collaboration	1Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRISt AL 2Princeton University, ORFE 3Deep Mind Paris 4Google Research, Brain Team sarah.perrin@inria.fr, lauriere@princeton.edu, {perolat, mfgeist, relie, pietquin}@google.com
Pseudocode	Yes	Algorithm 1: Generic Fictitious Play in MFGs; Algorithm 2: Flock n RL
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for the Flock n RL method or a direct link to a code repository. It mentions using 'stable baselines' which is a third-party tool.
Open Datasets	No	The paper does not use a pre-existing, publicly available dataset in the traditional sense. Instead, data is generated dynamically within a simulated environment: 'we sample N agents from µj at the beginning of step 1 (i.e. we do not sample new agents from µj every time we need to compute the reward). During the learning, at the beginning of each episode, we sample a starting state s0 µj.'
Dataset Splits	No	The paper does not explicitly provide information on training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not specify the exact hardware components (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Open AI gym environment', 'stable baselines [Hill et al., 2018]', 'Soft Actor Critic (SAC) [Haarnoja et al., 2018]', and 'Neural Spline Flows (NSF) [Durkan et al., 2019]', but does not provide specific version numbers for these dependencies, which are necessary for reproducible setup.
Experiment Setup	Yes	We deﬁne a state s S as s = (x, v) where x and v are respectively the vectors of positions and velocities. Each coordinate xi of the position can take any continuous value in the d-dimensional box xi [ 100, +100], while the velocities are also continuous and clipped vi [ 1, 1]. The state space for the positions is a torus... We consider noise ϵi t N(0, t) and the following reward: ri t = f ﬂock,i β,t ui t 2 2 + vi t min{ xi 2,t 50 }, where xi 2,t stands for the second coordinate of the i-th agent s position at time t. ... in our setting, given a population distribution µ, the objective is to maximize: Jµ(π) = E(st,ut) h P+ t=0 γtr(xt, vt, ut, µt) + δH(π( \|st)) i , where H denotes the entropy and δ 0 is a weight. In the experiment, we set the initial velocities perpendicular to the desired ones...