A Rate-Distortion View of Uncertainty Quantification
Authors: Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that our method can detect both OOD samples and misclassified samples. In particular, DAB outperforms baselines when used for OOD tasks and closes the gap between single forward pass methods and expensive ensembles in terms of calibration (Tables 2, 4). |
| Researcher Affiliation | Collaboration | 1Machine Learning Department, Auton Lab, Carnegie Mellon University 2Computer Science Department, Princeton University 3Sony Computer Science Laboratories Inc. Tokyo, Japan. |
| Pseudocode | Yes | The pseudocode of our method (Algorithm 1) along with a practical implementation for mini-batch training is given in Appendix B. |
| Open Source Code | Yes | Publicly available code for reproducing the experiments can be found at: https://github.com/ifiaposto/Distance_ Aware_Bottleneck |
| Open Datasets | Yes | Train CIFAR-10; trained on the CIFAR-10 dataset; Image Net-1K dataset; UCI, Energy Efficiency dataset (Markelle Kelly, 1998). |
| Dataset Splits | No | The paper discusses training and testing on standard datasets like CIFAR-10 and ImageNet, which have predefined splits, but it does not explicitly provide percentages or counts for training/validation/test splits or mention a specific validation set. |
| Hardware Specification | Yes | All models are trained on four 32GB V100 GPUs. All models are trained on four 48GB RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions using 'tf.keras.optimizers.Adam', 'tf.keras.keras.layers.Dense', 'tf.keras.optimizers.SGD', and 'tf.keras.applications.resnet50.Res Net50', implying TensorFlow/Keras, but it does not specify exact version numbers for these software components. |
| Experiment Setup | Yes | We use a network with 3 dense layers. We apply DAB to the last one. The intermediate layers have 100 hidden units and ELU non-linearity. We perform 1500 training iterations. The optimizer of both the main network and the centroids are set to tf.keras.optimizers.Adam with initial learning rates ηθ = 0.001 and ηϕ = 0.01 respectively. All models are trained for 200 epochs. |