JAWS: Auditing Predictive Uncertainty Under Covariate Shift
Authors: Drew Prinster, Anqi Liu, Suchi Saria
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Practically, JAWS outperform state-of-the-art predictive inference baselines in a variety of biased real world data sets for interval-generation and error-assessment predictive uncertainty auditing tasks. |
| Researcher Affiliation | Academia | Drew Prinster Department of Computer Science Johns Hopkins University Baltimore, MD 21211 drew@cs.jhu.edu Anqi Liu Department of Computer Science Johns Hopkins University Baltimore, MD 21211 aliu@cs.jhu.edu Suchi Saria Department of Computer Science Johns Hopkins University Baltimore, MD 21211 ssaria@cs.jhu.edu |
| Pseudocode | No | The paper describes algorithms but does not include a formal pseudocode block or algorithm environment. |
| Open Source Code | Yes | Additional analysis in Appendix D and code at https://github.com/drewprinster/jaws.git. |
| Open Datasets | Yes | We conduct experiments on five UCI datasets Dua and Graff [2017] with various dimensionality (Table 2): airfoil self-noise, red wine quality prediction [Cortez et al., 2009], wave energy converters, superconductivity [Hamidieh, 2018], and communities and crime [Redmond and Baveja, 2002]. |
| Dataset Splits | No | We first randomly sample 200 points for the training data, and then sample the biased test data from the remaining datapoints that are not used for training with probabilities proportional to exponential tilting weights. No explicit validation split information or specific percentages for training/testing are provided beyond the 200 training points and 'remaining datapoints' for test. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU models, memory details) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | For neural network predictors, we use a 3-layer feed-forward neural network with ReLU activations and 512, 256, 128 units respectively. For random forest predictors, we use an ensemble of 100 decision trees. The learning rate for neural networks was chosen by tuning over the range {1e-3, 1e-4, 1e-5} and the number of training epochs was chosen by tuning over {50, 100, 150, 200}. |