reproducibilityindex.ai

Beyond the Norms: Detecting Prediction Errors in Regression Models

Authors: Andres Altieri, Marco Romanelli, Georg Pichler, Florence Alberge, Pablo Piantanida

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.
Researcher Affiliation	Academia	1Laboratoire des signaux et syst emes (L2S), Universit e Paris Saclay CNRS Centrale Sup elec, Gif-sur-Yvette, France 2New York University, New York, NY, USA 3Institute of Telecommunications, TU Wien, Vienna, Austria 4Syst emes et applications des technologies de l information et de l energie (SATIE), CNRS Universit e Paris-Saclay, Gif-sur-Yvette, France 5International Laboratory on Learning Systems (ILLS) and Quebec AI Institute (Mila), Mc Gill ETS CNRS Universit e Paris-Saclay Centrale Sup elec, Montreal (QC), Canada. Correspondence to: Andres Altieri <andres.altieri@centralesupelec.fr>, Marco Romanelli <mr6852@nyu.edu>.
Pseudocode	Yes	Algorithm 1 Baseline based on the estimation of the conditional distribution of Y\|X = x; Algorithm 2 DV-Y: Diversity discriminator based on the estimates of the distribution of Y\|X = x; Algorithm 3 Baseline based on the estimation of the conditional distribution of D(Y, X)\|X = x; Algorithm 4 DV-Y-D: Diversity discriminator based on the estimation of the conditional distribution of D(Y, X)\|X = x.
Open Source Code	Yes	Our code is available at https: //zenodo.org/records/11281964.
Open Datasets	Yes	We consider 8 well-known UCI (Kelly et al.) regression datasets that have been extensively used in uncertainty quantification (Chung et al., 2021; Gal & Ghahramani, 2016; Hernandez-Lobato & Adams, 2015; Tagasovska & Lopez Paz, 2019).
Dataset Splits	Yes	For each dataset and seed, we initially train a regressor using a consistent neural architecture featuring three hidden layers and 64 neurons. This involves employing 5-fold cross-validation on the learning rate and weight decay, with 90% of the data serving as the training set and the remaining 10% as a test set for both the regressors and the detectors. Subsequently, the training set of the regressor is reused to train estimators for the conditional distributions of Y \|X = x and D\|X = x... The best parameters are chosen to maximize the AUROC of the validation set.
Hardware Specification	Yes	In Table 5 we show some examples of training time for the diversity coefficients in different scenarios. This time considers only the training of the diversity coefficients. The time differences are due to the different times required to generate the samples with each conditional model. ... using a Nvidia V100 GPU with 16GB of RAM.
Software Dependencies	No	The paper mentions software like "Adam optimizer" and "batch normalization" as well as names of estimation methods (CG, SQR, KNIFE, RIO), but it does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	For each dataset and seed, we initially train a regressor using a consistent neural architecture featuring three hidden layers and 64 neurons. This involves employing 5-fold cross-validation on the learning rate and weight decay... For the diversity detectors DV-Y and DV-D we use a neural network with four hidden layers of 64 neurons. We train for 25 epochs and choose the optimal learning rate using only a validation set of 20% of the data. The best parameters are chosen to maximize the AUROC of the validation set. ... CG: we averaged 10 models and performed 5-fold cross-validation over the learning rate in {0.01, 0.001, 0.0001}. Each model was trained for 150 epochs. SQR: ... learning rate in {10-3, 5 10-4, 10-4, 5 10-5, 10-5} and weight decay in {0, 0.025, 0.05, 0.075, 0.1}. The models were trained for 1000 epochs. KNIFE: ... learning rate in {0.001, 0.0005, 0.0001} and weight decay in {0, 0.0125, 0.025}. The models were trained for 1000 epochs.