Uncertainty Quantification via Stable Distribution Propagation

Authors: Felix Petersen, Aashwin Ananda Mishra, Hilde Kuehne, Christian Borgelt, Oliver Deussen, Mikhail Yurochkin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To empirically validate SDP, we (i) compare it to other distribution propagation approaches in a variety of settings covering total variation (TV) distance and Wasserstein distance; (ii) compare it to other uncertainty quantification methods on 8 UCI [25] regression tasks; and (iii) demonstrate the utility of Cauchy distribution propagation in selective prediction on MNIST [26] and EMNIST [27].
Researcher Affiliation Collaboration Felix Petersen1, Aashwin Mishra1, Hilde Kuehne2,3, Christian Borgelt4, Oliver Deussen5, Mikhail Yurochkin3 1Stanford University, 2University of Bonn, 3MIT-IBM Watson AI Lab, 4University of Salzburg, 5University of Konstanz, mail@felix-petersen.de
Pseudocode Yes We provide pseudo-code and Py Torch implementations of SDP in SM D.
Open Source Code Yes 1The code is publicly available at github.com/Felix-Petersen/distprop.
Open Datasets Yes 8 UCI [25] regression tasks, selective prediction on MNIST [26] and EMNIST [27], CIFAR-10 Res Net-18 [46] model.
Dataset Splits Yes In Tab. 4, following [9], we report the test PICP and MPIW of those models where the validation PICP lies between 92.5% and 97.5% using the evaluation code provided by Tagasovska et al. [9].
Hardware Specification Yes Times per epoch on CIFAR-10 with a batch size of 128 on a single V100 GPU.
Software Dependencies Yes tested with Py Torch version 1.13.1
Experiment Setup Yes That is, we use a network with 1 Re LU activated hidden layer, with 64 hidden neurons and train it for 5000 epochs. We perform this for 20 seeds and for a learning rate η {10 2, 10 3, 10 4} and weight decay {0, 10 3, 10 2, 10 1, 1}. For the input standard deviation, we made a single initial run with input variance σ2 {10 8, 10 7, 10 6, 10 5, 10 4, 10 3, 10 2, 10 1, 100} and then (for each data set) used 11 variances at a resolution of 100.1 around the best initial variance.