Uncertainty Quantification via Stable Distribution Propagation
Authors: Felix Petersen, Aashwin Ananda Mishra, Hilde Kuehne, Christian Borgelt, Oliver Deussen, Mikhail Yurochkin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To empirically validate SDP, we (i) compare it to other distribution propagation approaches in a variety of settings covering total variation (TV) distance and Wasserstein distance; (ii) compare it to other uncertainty quantification methods on 8 UCI [25] regression tasks; and (iii) demonstrate the utility of Cauchy distribution propagation in selective prediction on MNIST [26] and EMNIST [27]. |
| Researcher Affiliation | Collaboration | Felix Petersen1, Aashwin Mishra1, Hilde Kuehne2,3, Christian Borgelt4, Oliver Deussen5, Mikhail Yurochkin3 1Stanford University, 2University of Bonn, 3MIT-IBM Watson AI Lab, 4University of Salzburg, 5University of Konstanz, mail@felix-petersen.de |
| Pseudocode | Yes | We provide pseudo-code and Py Torch implementations of SDP in SM D. |
| Open Source Code | Yes | 1The code is publicly available at github.com/Felix-Petersen/distprop. |
| Open Datasets | Yes | 8 UCI [25] regression tasks, selective prediction on MNIST [26] and EMNIST [27], CIFAR-10 Res Net-18 [46] model. |
| Dataset Splits | Yes | In Tab. 4, following [9], we report the test PICP and MPIW of those models where the validation PICP lies between 92.5% and 97.5% using the evaluation code provided by Tagasovska et al. [9]. |
| Hardware Specification | Yes | Times per epoch on CIFAR-10 with a batch size of 128 on a single V100 GPU. |
| Software Dependencies | Yes | tested with Py Torch version 1.13.1 |
| Experiment Setup | Yes | That is, we use a network with 1 Re LU activated hidden layer, with 64 hidden neurons and train it for 5000 epochs. We perform this for 20 seeds and for a learning rate η {10 2, 10 3, 10 4} and weight decay {0, 10 3, 10 2, 10 1, 1}. For the input standard deviation, we made a single initial run with input variance σ2 {10 8, 10 7, 10 6, 10 5, 10 4, 10 3, 10 2, 10 1, 100} and then (for each data set) used 11 variances at a resolution of 100.1 around the best initial variance. |