Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Building Calibrated Deep Models via Uncertainty Matching with Auxiliary Interval Predictors
Authors: Jayaraman J. Thiagarajan, Bindya Venkatesh, Prasanna Sattigeri, Peer-Timo Bremer6005-6012
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using experiments in regression, time-series forecasting, and object localization, we show that our approach achieves significant improvements over existing uncertainty quantification methods, both in terms of model fidelity and calibration error. |
| Researcher Affiliation | Collaboration | Lawrence Livermore National Laboratory, Arizona State University, +IBM Research AI |
| Pseudocode | Yes | Algorithm 1: Building calibrated deep predictive models. Input: Labeled data {(xi, yi)}N i=1, Desired calibration level α, Number of epochs nm and nc. Output: Trained mean and interval estimators F and I Initialization:Randomly initialize parameters Θ , Φ ; while not converged do for nm epochs do Compute intervals δu i , δl i = I(xi; Φ ) ; Compute loss function LF using Eq. (7) for Sigma Fit or Eq. (10) for IQR Fit ; Update Θ = arg minΘ LF ; end for nc epochs do Obtain predictions ˆyi = F(xi; Θ ) ; Compute loss function LI using Eq. (5) ; Update Φ = arg minΦ LI ; end end |
| Open Source Code | No | The paper does not provide an explicit statement or link for the availability of its source code. |
| Open Datasets | Yes | Datasets: We considered 6 datasets from the UCI repository (Dua and Graff 2017), which ranged between 7 and 124 in their dimensionality, and between 391 and 4898 in terms of sample size: crime, red wine quality, white wine quality, parkinsons, boston housing and auto mpg. |
| Dataset Splits | Yes | For each dataset, we used a random 80/20 split for train/test, and we report the averaged results obtained from 5 random trials. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions deep learning frameworks but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Hyper-parameter Choices: We used Algorithm 1 to solve the optimization problem in Eq. (1), based on the two proposed strategies, namely Sigma Fit and IQR Fit. We used the following hyper-parameter choices for all experiments: The penalties for the loss LI in Eq. (5) were fixed at βn = 0.1 and βs = 0.3 respectively. For the Sigma Fit method, the penalty for the uncertainty matching term was set at λm = 0.5. Similarly, the hyper-parameters for constructing the loss LF for IQR Fit, we used λm = 0.4, λu = λl = 0.3. While the outer loop in Algorithm 1 was run until convergence, both the mean estimator and the PI estimator networks were trained for nm = nc = 10 epochs in each iteration. |