Greedy inference with structure-exploiting lazy maps
Authors: Michael Brennan, Daniele Bigoni, Olivier Zahm, Alessio Spantini, Youssef Marzouk
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical examples We present numerical demonstrations of the lazy framework as follows. We first illustrate Algorithm 2 on a 2-dimensional toy example, where we show the progressive Gaussianization of the posterior using a sequence of 1-dimensional lazy maps. We then demonstrate the benefits of the lazy framework (Algorithms 1 and 2) in several challenging inference problems. We consider Bayesian logistic regression and a Bayesian neural network, and compare the performance of a baseline transport map to lazy maps using the same underlying transport class. We measure performance improvements in four ways: (1) the final ELBO achieved by the transport maps after training; (2 and 3): the final trace diagnostics 1 2 Tr(HB ℓ) and 1 2 Tr(Hℓ), which bound the error DKL(π||(Tℓ) ρ); and (4) the variance diagnostic 1 2Vρ[log ρ/T ℓπ], which is an asymptotic approximation of DKL((Tℓ) ρ||π) as (Tℓ) ρ π (see [40]). Finally, we highlight the advantages of greedily training lazy maps in a nonlinear problem defined by a high-dimensional elliptic partial differential equation (PDE), often used for testing high-dimensional inference methods [4, 16, 53]. |
| Researcher Affiliation | Academia | Michael C. Brennan Massachusetts Institute of Technology Cambridge, MA 02139 USA mcbrenn@mit.edu Daniele Bigoni Massachusetts Institute of Technology Cambridge, MA 02139 USA dabi@mit.edu Olivier Zahm Université Grenoble Alpes, INRIA, CNRS, LJK 38000 Grenoble, France olivier.zahm@inria.fr Alessio Spantini Massachusetts Institute of Technology Cambridge, MA 02139 USA alessio.spantini@gmail.com Youssef Marzouk Massachusetts Institute of Technology Cambridge, MA 02139 USA ymarz@mit.edu |
| Pseudocode | Yes | Algorithm 1 Construction of a lazy map. [...] Algorithm 2 Construction of a deeply lazy map |
| Open Source Code | Yes | 4Code for the numerical examples can be found at https://github.com/Michael CBrennan/lazymaps and http://bit.ly/2Qlel XF. |
| Open Datasets | Yes | We consider a high-dimensional Bayesian logistic regression problem using the UCI Parkinson s disease classification data [1], studied in [49]. [...] UCI yacht hydrodynamics data set [2]. [...] Data for 4.4, G.4, and G.5 can be downloaded at http://bit.ly/2X09Ns8, http://bit.ly/2Hyt Qc0 and http://bit.ly/2Eug5ZR. |
| Dataset Splits | No | The paper mentions using specific datasets but does not provide explicit details on training, validation, or test splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning the software frameworks used. |
| Software Dependencies | No | The paper mentions software like 'Transport Maps framework [7]', 'Tensor Flow probability library [19]', 'FEni CS [37]', and 'dolfin-adjoint [22]' but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We consider degree 3 polynomial maps as the underlying transport class. We use Gauss quadrature rules of order 10 for the discretization of the KL divergence and the approximation of HB ℓ(m = 121 in Algorithm 3 and 5). [...] We choose a relatively uninformative prior of N(0, 102Id). [...] In G3-IAF, each layer has rank r = 200. [...] Our inference problem is 581-dimensional, given a network input dimension of 6, one hidden layer of dimension 20, and an output layer of dimension 1. We use sigmoid activations in the input and hidden layer, and a linear output layer. Model parameters are endowed with independent Gaussian priors with zero mean and variance 100. [...] Expectations appearing in the algorithm are discretized with m = 500 Monte Carlo samples. To not waste work in the early iterations, we use affine maps of rank r = 4 for iterations ℓ= 1, . . . , 5. Then we switch to polynomial maps of degree 2 and rank r = 2 for the remaining iterations. |