Getting a CLUE: A Method for Explaining Uncertainty Estimates

Authors: Javier Antoran, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate CLUE through 1) a novel framework for evaluating counterfactual explanations of uncertainty, 2) a series of ablation experiments, and 3) a user study. Our experiments show that CLUE outperforms baselines and enables practitioners to better understand which input patterns are responsible for predictive uncertainty.
Researcher Affiliation Academia Javier Antorán University of Cambridge ja666@cam.ac.uk Umang Bhatt University of Cambridge usb20@cam.ac.uk Tameem Adel University of Cambridge University of Liverpool tah47@cam.ac.uk Adrian Weller University of Cambridge The Alan Turing Institute aw665@cam.ac.uk José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute jmh233@cam.ac.uk
Pseudocode Yes The CLUE algorithm and a diagram of our procedure are provided in Algorithm 1 and Figure 4, respectively.
Open Source Code Yes Our code is at: github.com/cambridge-mlg/CLUE.
Open Datasets Yes We validate CLUE on LSAT academic performance regression (Wightman et al., 1998), UCI Wine quality regression, UCI Credit classification (Dua & Graff, 2017), a 7 feature variant of COMPAS recidivism classification (Angwin et al.), and MNIST image classification (Le Cun & Cortes, 2010).
Dataset Splits Yes For each, we select roughly the 20% most uncertain test points as those for which we reject our BNNs decisions. We only generate CLUEs for rejected points. Rejection thresholds, architectures, and hyperparameters are in Appendix B.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'RAdam optimizer (Liu et al., 2020)' but does not specify software versions for frameworks or libraries like PyTorch, TensorFlow, etc., or for the optimizer itself.
Experiment Setup Yes Optimization runs for a minimum of three iterations and a maximum of 35 iterations, with a learning rate of 0.1. [...] We use a fixed step size of ϵ = 0.01 and batch sizes of 512. [...] We train all generative models with the RAdam optimizer (Liu et al., 2020) with a learning rate of 1e 4 for tabular data and 3e 4 for MNIST. [...] All architectural hyperparameters are provided in Table 4. [...] The rejection thresholds used for each dataset are displayed in Table 5. The same table contains the values of λx used in all experiments.