Estimating Uncertainty Online Against an Adversary
Authors: Volodymyr Kuleshov, Stefano Ermon
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We establish formal guarantees for our methods, and we validate them on two real-world problems: question answering and medical diagnosis from genomic data. We now proceed to study Algorithm 1 empirically. |
| Researcher Affiliation | Academia | Volodymyr Kuleshov Stanford University Stanford, CA 94305 tkuleshov@cs.stanford.edu Stefano Ermon Stanford University Stanford, CA 94305 ermon@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Online Recalibration Require: Online calibration subroutine F cal and number of buckets M 1: Let I = {[0, 1 M ), ..., [ M 1 M , 1]} be a set of intervals that partition [0, 1]. 2: Let F = {F cal j | j = 0, ..., M 1} be a set of M independent instances of F cal. 3: for t = 1, 2, ...: do 4: Observe uncalibrated forecast p F t . 5: Let Ij I be the interval containing p F t . 6: Let pt be the forecast of F cal j . 7: Output pt. Observe yt and pass it to F cal j . |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | Natural language understanding. We used Algorithm 1 to recalibrate a state-of-the-art question answering system (Berant and Liang 2014) on the popular Free917 dataset (641 training, 276 testing examples). Medical diagnosis. Our last task is predicting the risk of type 1 diabetes from genomic data. We use genotypes of 3,443 subjects (1,963 cases, 1,480 controls) over 447,221 SNPs (The Wellcome Trust Case Control Consortium 2007) |
| Dataset Splits | Yes | Natural language understanding. We used Algorithm 1 to recalibrate a state-of-the-art question answering system (Berant and Liang 2014) on the popular Free917 dataset (641 training, 276 testing examples). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions using a 'linear support vector machine (SVM)' but does not provide specific software names with version numbers for its implementation or other dependencies. |
| Experiment Setup | Yes | We used an online ℓ1-regularized linear support vector machine (SVM) to predict outcomes one patient at a time, and report performance for each t [T]. Uncalibrated probabilities are normalized raw SVM scores st, i.e. p F t = (st + mt)/2mt, where mt = max1 r t |sr|. |