JAWS: Auditing Predictive Uncertainty Under Covariate Shift

Authors: Drew Prinster, Anqi Liu, Suchi Saria

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Practically, JAWS outperform state-of-the-art predictive inference baselines in a variety of biased real world data sets for interval-generation and error-assessment predictive uncertainty auditing tasks.
Researcher Affiliation Academia Drew Prinster Department of Computer Science Johns Hopkins University Baltimore, MD 21211 drew@cs.jhu.edu Anqi Liu Department of Computer Science Johns Hopkins University Baltimore, MD 21211 aliu@cs.jhu.edu Suchi Saria Department of Computer Science Johns Hopkins University Baltimore, MD 21211 ssaria@cs.jhu.edu
Pseudocode No The paper describes algorithms but does not include a formal pseudocode block or algorithm environment.
Open Source Code Yes Additional analysis in Appendix D and code at https://github.com/drewprinster/jaws.git.
Open Datasets Yes We conduct experiments on five UCI datasets Dua and Graff [2017] with various dimensionality (Table 2): airfoil self-noise, red wine quality prediction [Cortez et al., 2009], wave energy converters, superconductivity [Hamidieh, 2018], and communities and crime [Redmond and Baveja, 2002].
Dataset Splits No We first randomly sample 200 points for the training data, and then sample the biased test data from the remaining datapoints that are not used for training with probabilities proportional to exponential tilting weights. No explicit validation split information or specific percentages for training/testing are provided beyond the 200 training points and 'remaining datapoints' for test.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU models, memory details) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the implementation.
Experiment Setup Yes For neural network predictors, we use a 3-layer feed-forward neural network with ReLU activations and 512, 256, 128 units respectively. For random forest predictors, we use an ensemble of 100 decision trees. The learning rate for neural networks was chosen by tuning over the range {1e-3, 1e-4, 1e-5} and the number of training epochs was chosen by tuning over {50, 100, 150, 200}.