Effective Bayesian Heteroscedastic Regression with Deep Neural Networks

Authors: Alexander Immer, Emanuele Palumbo, Alexander Marx, Julia Vogt

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of the natural parameterization compared to the mean-variance (naive) one, and empirical Bayes (EB) to optimizing a single regularization parameter using a grid search on the validation set (GS), and the MAP prediction vs a Bayesian posterior predictive (PP) in comparison to state-of-the-art baselines on three experimental settings: the UCI regression benchmark [Hernandez Lobato and Adams, 2015], which is also well-established for heteroscedastic regression [Seitzer et al., 2022, Stirn et al., 2023], the recently introduced CRISPR-Cas13 gene expression datasets [Stirn et al., 2023], and our proposed heteroscedastic image-regression dataset (cf. Problem 2.1) in three noise variants.
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich, Switzerland 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3AI Center, ETH Zurich, Switzerland
Pseudocode Yes Algorithm 1 Optimization of Heteroscedastic Regression Models
Open Source Code Yes Code at https://github.com/aleximmer/heteroscedastic-nn.
Open Datasets Yes We evaluate the effectiveness of the natural parameterization compared to the mean-variance (naive) one, and empirical Bayes (EB) to optimizing a single regularization parameter using a grid search on the validation set (GS), and the MAP prediction vs a Bayesian posterior predictive (PP) in comparison to state-of-the-art baselines on three experimental settings: the UCI regression benchmark [Hernandez Lobato and Adams, 2015], which is also well-established for heteroscedastic regression [Seitzer et al., 2022, Stirn et al., 2023], the recently introduced CRISPR-Cas13 gene expression datasets [Stirn et al., 2023], and our proposed heteroscedastic image-regression dataset (cf. Problem 2.1) in three noise variants.
Dataset Splits Yes For all methods using grid-search, we first split the training data into a 90/10 train-validation split.
Hardware Specification Yes The training was done 5 times (different seeds) per model-dataset pair to estimate mean and standard error and were run on a computing cluster with V100 and A100 NVIDIA GPUs.
Software Dependencies No The paper mentions software like 'Py Torch implementation from Krishnan et al. [2022]', 'laplace-torch package [Daxberger et al., 2021]', 'automatic second-order differentiation library [asdl; Osawa, 2021]', 'pytorch [Paszke et al., 2017]', and 'jax [Bradbury et al., 2018]', but does not provide specific version numbers for these general software dependencies.
Experiment Setup Yes We train all models, except for the VI and MC-Dropout baselines, with Adam optimizer using a batch size of 256 for 5000 epochs and an initial learning rate of 10-2 that is decayed to 10-5 using a cosine schedule.