Implications of Model Indeterminacy for Explanations of Automated Decisions
Authors: Marc-Etienne Brunet, Ashton Anderson, Richard Zemel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To explore the extent to which model indeterminacy may impact the consistency of explanations in a practical setting, we conduct a series of experiments. |
| Researcher Affiliation | Academia | Marc-Etienne Brunet University of Toronto Vector Institute mebrunet@cs.toronto.edu Ashton Anderson University of Toronto Vector Institute ashton@cs.toronto.edu Richard Zemel University of Toronto Columbia University Vector Institute zemel@cs.toronto.edu |
| Pseudocode | No | The paper describes methods and mathematical formulations but does not contain a structured pseudocode or algorithm block, nor is there a section explicitly labeled "Pseudocode" or "Algorithm". |
| Open Source Code | No | Experimental source code will be made available at github.com/mebrunet/model-indeterminacy |
| Open Datasets | Yes | We use three different (binary) risk assessment datasets (all available on Kaggle): UCI Credit Card [35], Give Me Some Credit, and Porto Seguro s Safe Driver Prediction. Their details can be found in Appendix B.1. |
| Dataset Splits | Yes | We first split each dataset into a development and a holdout set (70 / 30), and apply one-hot encoding and standard scaling. We then run a model selection process with three model classes: logistic regression (LR), multi-layer perceptron (MLP), and a tabular Res Net (TRN) recently proposed by Gorishniy et al. [10]. We sweep through a range of hyperparameter settings, trying a total of 408 model-hyperparameter configurations per dataset. For each configuration, we pick a random seed and use it to control a shuffled split of the development dataset into train and validation sets (70 / 30). |
| Hardware Specification | No | Our experiments were conducted on a GPU accelerated computing cluster. |
| Software Dependencies | No | ML models were written in Py Torch [26], and the analysis used Num Py [12] and Matplotlib [13] 1 2. |
| Experiment Setup | Yes | We sweep through a range of hyperparameter settings, trying a total of 408 model-hyperparameter configurations per dataset. For each configuration, we pick a random seed and use it to control a shuffled split of the development dataset into train and validation sets (70 / 30). This seed also controls the randomness used in training (optimization). We fit the models using Adam [15] with a patience-based stopping criteria on the validation set. We also up-weight the rare class, creating a balanced loss. We repeat this process with 3 random seeds per configuration, obtaining a total of 1224 model instances per dataset. |