All-in-one simulation-based inference

Authors: Manuel Gloeckler, Michael Deistler, Christian Dietrich Weilbach, Frank Wood, Jakob H. Macke

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated performance in approximating posterior distributions across four benchmark tasks (Lueckmann et al., 2021). For each task, samples for ten ground-truth posteriors are available (Appendix Sec. A2.2), and we assessed performance as classifier two-sample test (C2ST) accuracy to these samples. Here, a score of 0.5 signifies perfect alignment with the ground truth posterior, and 1.0 indicates that a classifier can completely distinguish between the approximation and the ground truth. All results are obtained using the Variance Exploding SDE (VESDE); additional results using the Variance Preserving SDE (VPSDE) can be found in Appendix Sec. A3. See Appendix Sec. A2 for details on the parameterization.
Researcher Affiliation Academia 1Machine Learning in Science, University of T ubingen and T ubingen AI Center, T ubingen, Germany 2Department of Computer Science, University of British Columbia, Vancouver, Canada 3Max Planck Institute for Intelligent Systems, Department Empirical Inference, T ubingen, Germany. Correspondence to: Manuel Gloeckler <manuel.gloeckler@uni-tuebingen.de>, Jakob H. Macke <jakob.macke@uni-tuebingen.de>.
Pseudocode Yes Algorithm 1 General Guidance
Open Source Code Yes Code to reproduce results is available at https://github.com/ mackelab/simformer.
Open Datasets Yes We evaluated performance in approximating posterior distributions across four benchmark tasks (Lueckmann et al., 2021). For each task, samples for ten ground-truth posteriors are available (Appendix Sec. A2.2), and we assessed performance as classifier two-sample test (C2ST) accuracy to these samples. The tasks Gaussian Linear, Gaussian Mixture, Two Moons, and SLCP were used in Lueckmann et al. (2021).
Dataset Splits No Training ceased upon convergence, as indicated by early stopping based on validation loss. However, the paper does not specify the exact splits (e.g., percentages or counts) used for training, validation, and testing.
Hardware Specification No This research was enabled in part by technical support and computational resources provided by the Digital Research Alliance of Canada Compute Canada (alliancecan.ca), the Advanced Research Computing at the University of British Columbia (arc.ubc.ca), Amazon, and Oracle. However, no specific hardware models (e.g., GPU, CPU models, or detailed specifications) are provided.
Software Dependencies No We used JAX (Bradbury et al., 2018) as backbone and hydra (Yadan, 2019) to track all configurations. Code to reproduce results is available at https://github.com/ mackelab/simformer. We use SBI (Tejero-Cantero et al., 2020) for reference implementations of baselines. While software names are mentioned, specific version numbers for JAX, hydra, or SBI are not provided.
Experiment Setup Yes For implementing Neural Posterior Estimation (NPE), Neural Ratio Estimation (NRE), and Neural Likelihood Estimation (NLE), we utilize the sbi library (Tejero-Cantero et al., 2020), adopting default parameters but opting for a more expressive neural spline flow for NPE and NLE. Each method was trained using the provided training loop with a batch size of 1000 and an Adam optimizer. Training ceased upon convergence, as indicated by early stopping based on validation loss. The employed transformer model features a token dimension of 50 and represents diffusion time through a 128-dimensional random Gaussian Fourier embedding. It comprises 6 layers and 4 heads with an attention size of 10, and a widening factor of 3, implying that the feed-forward block expands to a hidden dimension of 150. For the Lotka-Volterra, SIR, and Hodgkin-Huxley tasks, we increased the number of layers to 8. Similar to the above, we used a training batch size of 1000 and an Adam optimizer. In all our experiments, we sampled the condition mask MC as follows: At every training batch, we selected uniformly at random a mask corresponding to the joint, the posterior, the likelihood or two random masks. The random masks were drawn from a Bernoulli distribution with p = 0.3 and p = 7. In our experiments, we found this to work slightly better than just random sampling and sufficiently diverse to still represent all the conditionals. The edge mask ME is chosen to match the generative process (see Fig. A4). The undirected variant was obtained by symmetrization. Note that this is the only input we provide; additional necessary dependencies, e.g., due to conditioning, are algorithmically determined (see Sec. A1.1). For inference, we solved the reverse SDE using an Euler-Maruyama discretization. We use 500 steps by default; accuracy for different budgets is shown in Fig. A7.