Deterministic Langevin Monte Carlo with Normalizing Flows for Bayesian Inference

Authors: Richard Grumitt, Biwei Dai, Uros Seljak

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show on various examples that the method is competitive against state of the art sampling methods.
Researcher Affiliation Academia Richard D.P. Grumitt Department of Astronomy, Tsinghua University, Beijing 100084, China Biwei Dai Physics Department, University of California Berkeley, CA 94720, USA Uroš Seljak Physics Department, University of California and Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA
Pseudocode Yes Algorithm 1 Deterministic Langevin Monte Carlo with Normalizing Flows
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Did you include any new assets either in the supplemental material or as a URL? [Yes] Code included as supplemental material.
Open Datasets Yes Hierarchical logistic regression with a sparse prior applied to the German credit dataset is a popular benchmark for sampling methods [Dua and Graff, 2017]. German credit data taken from public UCI repository.
Dataset Splits Yes The number of layers L can be chosen based on cross-validation, where we set aside 20% of the samples, and iterate until validation data start to diverge from the training data.
Hardware Specification No Typical SINF training time is of order seconds on a CPU. We assume a likelihood gradient cost of 1 minute, and the cost of the NF itself (seconds) is negligible. (No specific CPU models, GPUs, or detailed configurations are provided for running experiments.)
Software Dependencies No The main baseline we compare against is the No-U-Turn Sampler (NUTS) [Hoffman et al., 2014], an adaptive HMC variant implemented in the Num Pyro library [Phan et al., 2019]. For NF we use SINF, which has very few hyper-parameters [Dai and Seljak, 2021], is fast, and iterative. (No specific software versions are provided.)
Experiment Setup Yes The number of layers L can be chosen based on cross-validation, where we set aside 20% of the samples, and iterate until validation data start to diverge from the training data. However, for the d = 1000 Gaussian (Section 5.5) we fix L = 5. At each iteration, we take Adagrad updates in the (U(x(t)) V (x(t))) direction [Duchi et al., 2011]. We use learning rates between 0.001 and 0.1, with smaller learning rates being more robust for targets with complicated geometries such as funnel distributions. Where we include NUTS as a baseline, we use 500 tuning steps.