Mean Field Games Flock! The Reinforcement Learning Way
Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show numerically that our algorithm can learn multi-group or high-dimensional flocking with obstacles. Our main contributions are: (1) we cast the flocking problem into a MFG and propose variations which allow multigroup flocking as well as flocking in high dimension with complex topologies, (2) we introduce the Flock n RL algorithm that builds upon the Fictitious Play paradigm and involves deep neural networks and RL to solve the model-free flocking MFG, and (3) we illustrate our approach on several numerical examples and evaluate the solution with approximate performance matrix and exploitability. |
| Researcher Affiliation | Collaboration | 1Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRISt AL 2Princeton University, ORFE 3Deep Mind Paris 4Google Research, Brain Team sarah.perrin@inria.fr, lauriere@princeton.edu, {perolat, mfgeist, relie, pietquin}@google.com |
| Pseudocode | Yes | Algorithm 1: Generic Fictitious Play in MFGs; Algorithm 2: Flock n RL |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the Flock n RL method or a direct link to a code repository. It mentions using 'stable baselines' which is a third-party tool. |
| Open Datasets | No | The paper does not use a pre-existing, publicly available dataset in the traditional sense. Instead, data is generated dynamically within a simulated environment: 'we sample N agents from µj at the beginning of step 1 (i.e. we do not sample new agents from µj every time we need to compute the reward). During the learning, at the beginning of each episode, we sample a starting state s0 µj.' |
| Dataset Splits | No | The paper does not explicitly provide information on training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not specify the exact hardware components (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI gym environment', 'stable baselines [Hill et al., 2018]', 'Soft Actor Critic (SAC) [Haarnoja et al., 2018]', and 'Neural Spline Flows (NSF) [Durkan et al., 2019]', but does not provide specific version numbers for these dependencies, which are necessary for reproducible setup. |
| Experiment Setup | Yes | We define a state s S as s = (x, v) where x and v are respectively the vectors of positions and velocities. Each coordinate xi of the position can take any continuous value in the d-dimensional box xi [ 100, +100], while the velocities are also continuous and clipped vi [ 1, 1]. The state space for the positions is a torus... We consider noise ϵi t N(0, t) and the following reward: ri t = f flock,i β,t ui t 2 2 + vi t min{ xi 2,t 50 }, where xi 2,t stands for the second coordinate of the i-th agent s position at time t. ... in our setting, given a population distribution µ, the objective is to maximize: Jµ(π) = E(st,ut) h P+ t=0 γtr(xt, vt, ut, µt) + δH(π( |st)) i , where H denotes the entropy and δ 0 is a weight. In the experiment, we set the initial velocities perpendicular to the desired ones... |