reproducibilityindex.ai

Implications of Model Indeterminacy for Explanations of Automated Decisions

Authors: Marc-Etienne Brunet, Ashton Anderson, Richard Zemel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To explore the extent to which model indeterminacy may impact the consistency of explanations in a practical setting, we conduct a series of experiments.
Researcher Affiliation	Academia	Marc-Etienne Brunet University of Toronto Vector Institute mebrunet@cs.toronto.edu Ashton Anderson University of Toronto Vector Institute ashton@cs.toronto.edu Richard Zemel University of Toronto Columbia University Vector Institute zemel@cs.toronto.edu
Pseudocode	No	The paper describes methods and mathematical formulations but does not contain a structured pseudocode or algorithm block, nor is there a section explicitly labeled "Pseudocode" or "Algorithm".
Open Source Code	No	Experimental source code will be made available at github.com/mebrunet/model-indeterminacy
Open Datasets	Yes	We use three different (binary) risk assessment datasets (all available on Kaggle): UCI Credit Card [35], Give Me Some Credit, and Porto Seguro s Safe Driver Prediction. Their details can be found in Appendix B.1.
Dataset Splits	Yes	We first split each dataset into a development and a holdout set (70 / 30), and apply one-hot encoding and standard scaling. We then run a model selection process with three model classes: logistic regression (LR), multi-layer perceptron (MLP), and a tabular Res Net (TRN) recently proposed by Gorishniy et al. [10]. We sweep through a range of hyperparameter settings, trying a total of 408 model-hyperparameter configurations per dataset. For each configuration, we pick a random seed and use it to control a shuffled split of the development dataset into train and validation sets (70 / 30).
Hardware Specification	No	Our experiments were conducted on a GPU accelerated computing cluster.
Software Dependencies	No	ML models were written in Py Torch [26], and the analysis used Num Py [12] and Matplotlib [13] 1 2.
Experiment Setup	Yes	We sweep through a range of hyperparameter settings, trying a total of 408 model-hyperparameter configurations per dataset. For each configuration, we pick a random seed and use it to control a shuffled split of the development dataset into train and validation sets (70 / 30). This seed also controls the randomness used in training (optimization). We fit the models using Adam [15] with a patience-based stopping criteria on the validation set. We also up-weight the rare class, creating a balanced loss. We repeat this process with 3 random seeds per configuration, obtaining a total of 1224 model instances per dataset.