U-trustworthy Models. Reliability, Competence, and Confidence in Decision-Making

Authors: Ritwik Vashistha, Arya Farahi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our first set of results challenges the probabilistic framework by demonstrating its potential to favor less trustworthy models and introduce the risk of misleading trustworthiness assessments. ... By offering both theoretical guarantees and experimental validation, AUC enables robust evaluation of trustworthiness, thereby enhancing model selection and hyperparameter tuning to yield more trustworthy outcomes.
Researcher Affiliation Academia Ritwik Vashistha*, Arya Farahi The Univertsity of Texas at Austin ritwik.v@utexas.edu, arya.farahi@austin.utexas.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information for open-source code for the methodology described.
Open Datasets No The paper mentions using 'the 2019 American Housing Survey data' and refers to 'Supplementary Materials for the description of data sets', but does not provide a direct link, DOI, or formal citation for accessing this specific dataset.
Dataset Splits Yes To conduct our analysis, we employ a homeownership dataset and perform 20-fold cross-validation to fine-tune k. ... The right panel of Figure 2, we present the average maximum utility on the test sample for 20 random test/train realizations. ... We repeated the experiment with 200 random data realizations, and the line shows the mean, and the shaded region is the standard error on the mean.
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers needed to replicate the experiment.
Experiment Setup No The paper discusses hyperparameter tuning for k in k-NN and varying decision thresholds, but it does not provide specific values for hyperparameters or other detailed training configurations (e.g., learning rates, batch sizes, optimizers) for the models used.