reproducibilityindex.ai

Flow Matching for Scalable Simulation-Based Inference

Authors: Jonas Wildberger, Maximilian Dax, Simon Buchholz, Stephen Green, Jakob H Macke, Bernhard Schölkopf

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a number of experiments to investigate the performance of FMPE.2 Our twopronged approach, which involves a set of benchmark tests and a real-world problem, is designed to probe complementary aspects of the method, covering breadth and depth of applications. First, on an established suite of SBI benchmarks, we show that FMPE performs comparably or better than NPE across most tasks, and in particular exhibits mass-covering posteriors in all cases (Sec. 4). We then push the performance limits of FMPE on a challenging real-world problem by turning to gravitational-wave inference (Sec. 5).
Researcher Affiliation	Academia	Jonas Wildberger Max Planck Institute for Intelligent Systems Tübingen, Germany wildberger.jonas@tuebingen.mpg.de Maximilian Dax Max Planck Institute for Intelligent Systems Tübingen, Germany maximilian.dax@tuebingen.mpg.de Simon Buchholz Max Planck Institute for Intelligent Systems Tübingen, Germany sbuchholz@tue.mpg.de Stephen R. Green University of Nottingham Nottingham, United Kingdom Jakob H. Macke Max Planck Institute for Intelligent Systems & Machine Learning in Science, University of Tübingen Tübingen, Germany Bernhard Schölkopf Max Planck Institute for Intelligent Systems Tübingen, Germany
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available here.
Open Datasets	Yes	We now evaluate FMPE on ten tasks included in the benchmark presented in [46]... For each task, we train three separate FMPE models with simulation budgets N {103, 104, 105}... We use the data settings described in [7], with a few minor modifications. In particular, we use the waveform model IMRPhenom Pv2 [76 78] and the prior displayed in Tab. 4.
Dataset Splits	Yes	We reserve 5% of the simulations for validation.
Hardware Specification	Yes	We train the NPE and FMPE networks with 5 106 simulations for 400 epochs using a batch size of 4096 on an A100 GPU.
Software Dependencies	No	The paper mentions building on 'public DINGO code' and using 'dopri5 discretization' but does not specify version numbers for these or other software components.
Experiment Setup	Yes	We sweep over the batch size and learning rate (which is particularly important as the simulation budgets differ by orders of magnitudes), the network size and the α parameter for the time prior defined in Section 3.3 (see Tab. 2 for the specific values). Table 2: Sweep values for the hyperparamters for the SBI benchmark. hidden dimensions 2n for n {4, . . . , 10} number of blocks 10, . . . , 18 batch size 2n for n {2, . . . , 9} learning rate 1.e-3, 5.e-4, 2.e-4, 1.e-4 α (for time prior) -0.25, -0.5, 0, 1, 4