Getting a CLUE: A Method for Explaining Uncertainty Estimates
Authors: Javier Antoran, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate CLUE through 1) a novel framework for evaluating counterfactual explanations of uncertainty, 2) a series of ablation experiments, and 3) a user study. Our experiments show that CLUE outperforms baselines and enables practitioners to better understand which input patterns are responsible for predictive uncertainty. |
| Researcher Affiliation | Academia | Javier Antorán University of Cambridge ja666@cam.ac.uk Umang Bhatt University of Cambridge usb20@cam.ac.uk Tameem Adel University of Cambridge University of Liverpool tah47@cam.ac.uk Adrian Weller University of Cambridge The Alan Turing Institute aw665@cam.ac.uk José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute jmh233@cam.ac.uk |
| Pseudocode | Yes | The CLUE algorithm and a diagram of our procedure are provided in Algorithm 1 and Figure 4, respectively. |
| Open Source Code | Yes | Our code is at: github.com/cambridge-mlg/CLUE. |
| Open Datasets | Yes | We validate CLUE on LSAT academic performance regression (Wightman et al., 1998), UCI Wine quality regression, UCI Credit classification (Dua & Graff, 2017), a 7 feature variant of COMPAS recidivism classification (Angwin et al.), and MNIST image classification (Le Cun & Cortes, 2010). |
| Dataset Splits | Yes | For each, we select roughly the 20% most uncertain test points as those for which we reject our BNNs decisions. We only generate CLUEs for rejected points. Rejection thresholds, architectures, and hyperparameters are in Appendix B. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'RAdam optimizer (Liu et al., 2020)' but does not specify software versions for frameworks or libraries like PyTorch, TensorFlow, etc., or for the optimizer itself. |
| Experiment Setup | Yes | Optimization runs for a minimum of three iterations and a maximum of 35 iterations, with a learning rate of 0.1. [...] We use a fixed step size of ϵ = 0.01 and batch sizes of 512. [...] We train all generative models with the RAdam optimizer (Liu et al., 2020) with a learning rate of 1e 4 for tabular data and 3e 4 for MNIST. [...] All architectural hyperparameters are provided in Table 4. [...] The rejection thresholds used for each dataset are displayed in Table 5. The same table contains the values of λx used in all experiments. |