Greedy inference with structure-exploiting lazy maps

Authors: Michael Brennan, Daniele Bigoni, Olivier Zahm, Alessio Spantini, Youssef Marzouk

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Numerical examples We present numerical demonstrations of the lazy framework as follows. We first illustrate Algorithm 2 on a 2-dimensional toy example, where we show the progressive Gaussianization of the posterior using a sequence of 1-dimensional lazy maps. We then demonstrate the benefits of the lazy framework (Algorithms 1 and 2) in several challenging inference problems. We consider Bayesian logistic regression and a Bayesian neural network, and compare the performance of a baseline transport map to lazy maps using the same underlying transport class. We measure performance improvements in four ways: (1) the final ELBO achieved by the transport maps after training; (2 and 3): the final trace diagnostics 1 2 Tr(HB ℓ) and 1 2 Tr(Hℓ), which bound the error DKL(π||(Tℓ) ρ); and (4) the variance diagnostic 1 2Vρ[log ρ/T ℓπ], which is an asymptotic approximation of DKL((Tℓ) ρ||π) as (Tℓ) ρ π (see [40]). Finally, we highlight the advantages of greedily training lazy maps in a nonlinear problem defined by a high-dimensional elliptic partial differential equation (PDE), often used for testing high-dimensional inference methods [4, 16, 53].
Researcher Affiliation Academia Michael C. Brennan Massachusetts Institute of Technology Cambridge, MA 02139 USA mcbrenn@mit.edu Daniele Bigoni Massachusetts Institute of Technology Cambridge, MA 02139 USA dabi@mit.edu Olivier Zahm Université Grenoble Alpes, INRIA, CNRS, LJK 38000 Grenoble, France olivier.zahm@inria.fr Alessio Spantini Massachusetts Institute of Technology Cambridge, MA 02139 USA alessio.spantini@gmail.com Youssef Marzouk Massachusetts Institute of Technology Cambridge, MA 02139 USA ymarz@mit.edu
Pseudocode Yes Algorithm 1 Construction of a lazy map. [...] Algorithm 2 Construction of a deeply lazy map
Open Source Code Yes 4Code for the numerical examples can be found at https://github.com/Michael CBrennan/lazymaps and http://bit.ly/2Qlel XF.
Open Datasets Yes We consider a high-dimensional Bayesian logistic regression problem using the UCI Parkinson s disease classification data [1], studied in [49]. [...] UCI yacht hydrodynamics data set [2]. [...] Data for 4.4, G.4, and G.5 can be downloaded at http://bit.ly/2X09Ns8, http://bit.ly/2Hyt Qc0 and http://bit.ly/2Eug5ZR.
Dataset Splits No The paper mentions using specific datasets but does not provide explicit details on training, validation, or test splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning the software frameworks used.
Software Dependencies No The paper mentions software like 'Transport Maps framework [7]', 'Tensor Flow probability library [19]', 'FEni CS [37]', and 'dolfin-adjoint [22]' but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We consider degree 3 polynomial maps as the underlying transport class. We use Gauss quadrature rules of order 10 for the discretization of the KL divergence and the approximation of HB ℓ(m = 121 in Algorithm 3 and 5). [...] We choose a relatively uninformative prior of N(0, 102Id). [...] In G3-IAF, each layer has rank r = 200. [...] Our inference problem is 581-dimensional, given a network input dimension of 6, one hidden layer of dimension 20, and an output layer of dimension 1. We use sigmoid activations in the input and hidden layer, and a linear output layer. Model parameters are endowed with independent Gaussian priors with zero mean and variance 100. [...] Expectations appearing in the algorithm are discretized with m = 500 Monte Carlo samples. To not waste work in the early iterations, we use affine maps of rank r = 4 for iterations ℓ= 1, . . . , 5. Then we switch to polynomial maps of degree 2 and rank r = 2 for the remaining iterations.