Mean Field Games Flock! The Reinforcement Learning Way

Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show numerically that our algorithm can learn multi-group or high-dimensional flocking with obstacles. Our main contributions are: (1) we cast the flocking problem into a MFG and propose variations which allow multigroup flocking as well as flocking in high dimension with complex topologies, (2) we introduce the Flock n RL algorithm that builds upon the Fictitious Play paradigm and involves deep neural networks and RL to solve the model-free flocking MFG, and (3) we illustrate our approach on several numerical examples and evaluate the solution with approximate performance matrix and exploitability.
Researcher Affiliation Collaboration 1Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRISt AL 2Princeton University, ORFE 3Deep Mind Paris 4Google Research, Brain Team sarah.perrin@inria.fr, lauriere@princeton.edu, {perolat, mfgeist, relie, pietquin}@google.com
Pseudocode Yes Algorithm 1: Generic Fictitious Play in MFGs; Algorithm 2: Flock n RL
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the Flock n RL method or a direct link to a code repository. It mentions using 'stable baselines' which is a third-party tool.
Open Datasets No The paper does not use a pre-existing, publicly available dataset in the traditional sense. Instead, data is generated dynamically within a simulated environment: 'we sample N agents from µj at the beginning of step 1 (i.e. we do not sample new agents from µj every time we need to compute the reward). During the learning, at the beginning of each episode, we sample a starting state s0 µj.'
Dataset Splits No The paper does not explicitly provide information on training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not specify the exact hardware components (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions software like 'Open AI gym environment', 'stable baselines [Hill et al., 2018]', 'Soft Actor Critic (SAC) [Haarnoja et al., 2018]', and 'Neural Spline Flows (NSF) [Durkan et al., 2019]', but does not provide specific version numbers for these dependencies, which are necessary for reproducible setup.
Experiment Setup Yes We define a state s S as s = (x, v) where x and v are respectively the vectors of positions and velocities. Each coordinate xi of the position can take any continuous value in the d-dimensional box xi [ 100, +100], while the velocities are also continuous and clipped vi [ 1, 1]. The state space for the positions is a torus... We consider noise ϵi t N(0, t) and the following reward: ri t = f flock,i β,t ui t 2 2 + vi t min{ xi 2,t 50 }, where xi 2,t stands for the second coordinate of the i-th agent s position at time t. ... in our setting, given a population distribution µ, the objective is to maximize: Jµ(π) = E(st,ut) h P+ t=0 γtr(xt, vt, ut, µt) + δH(π( |st)) i , where H denotes the entropy and δ 0 is a weight. In the experiment, we set the initial velocities perpendicular to the desired ones...