Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Effective Bayesian Heteroscedastic Regression with Deep Neural Networks
Authors: Alexander Immer, Emanuele Palumbo, Alexander Marx, Julia Vogt
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of the natural parameterization compared to the mean-variance (naive) one, and empirical Bayes (EB) to optimizing a single regularization parameter using a grid search on the validation set (GS), and the MAP prediction vs a Bayesian posterior predictive (PP) in comparison to state-of-the-art baselines on three experimental settings: the UCI regression benchmark [Hernandez Lobato and Adams, 2015], which is also well-established for heteroscedastic regression [Seitzer et al., 2022, Stirn et al., 2023], the recently introduced CRISPR-Cas13 gene expression datasets [Stirn et al., 2023], and our proposed heteroscedastic image-regression dataset (cf. Problem 2.1) in three noise variants. |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH Zurich, Switzerland 2Max Planck Institute for Intelligent Systems, Tรผbingen, Germany 3AI Center, ETH Zurich, Switzerland |
| Pseudocode | Yes | Algorithm 1 Optimization of Heteroscedastic Regression Models |
| Open Source Code | Yes | Code at https://github.com/aleximmer/heteroscedastic-nn. |
| Open Datasets | Yes | We evaluate the effectiveness of the natural parameterization compared to the mean-variance (naive) one, and empirical Bayes (EB) to optimizing a single regularization parameter using a grid search on the validation set (GS), and the MAP prediction vs a Bayesian posterior predictive (PP) in comparison to state-of-the-art baselines on three experimental settings: the UCI regression benchmark [Hernandez Lobato and Adams, 2015], which is also well-established for heteroscedastic regression [Seitzer et al., 2022, Stirn et al., 2023], the recently introduced CRISPR-Cas13 gene expression datasets [Stirn et al., 2023], and our proposed heteroscedastic image-regression dataset (cf. Problem 2.1) in three noise variants. |
| Dataset Splits | Yes | For all methods using grid-search, we first split the training data into a 90/10 train-validation split. |
| Hardware Specification | Yes | The training was done 5 times (different seeds) per model-dataset pair to estimate mean and standard error and were run on a computing cluster with V100 and A100 NVIDIA GPUs. |
| Software Dependencies | No | The paper mentions software like 'Py Torch implementation from Krishnan et al. [2022]', 'laplace-torch package [Daxberger et al., 2021]', 'automatic second-order differentiation library [asdl; Osawa, 2021]', 'pytorch [Paszke et al., 2017]', and 'jax [Bradbury et al., 2018]', but does not provide specific version numbers for these general software dependencies. |
| Experiment Setup | Yes | We train all models, except for the VI and MC-Dropout baselines, with Adam optimizer using a batch size of 256 for 5000 epochs and an initial learning rate of 10-2 that is decayed to 10-5 using a cosine schedule. |