Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff
Authors: Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S. Weld, Walter S. Lasecki, Eric Horvitz2429-2437
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on three high-stakes classification tasks show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff across different datasets, enabling more compatible yet accurate updates. |
| Researcher Affiliation | Collaboration | Gagan Bansal,1 Besmira Nushi,2 Ece Kamar,2 Daniel S. Weld,1 Walter S. Lasecki,3 Eric Horvitz2 1University of Washington, 2Microsoft Research, 3University of Michigan |
| Pseudocode | No | The paper defines mathematical equations for loss functions but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We introduce an open-source experimental platform2 for studying how people model the error boundary of an AI teammate in the presence of updates for a an AI-advised decision-making task. The platform exposes important design factors (e.g., task complexity, reward, update type) to the experimenter. 2Available at https://github.com/gagb/caja |
| Open Datasets | Yes | Datasets. To investigate whether a tradeoff exists between performance and compatibility of an update, we simulate updates to classifiers for three domains: recidivism prediction (Will a convict commit another crime?)(Angwin et al. 2016), in-hospital mortality prediction (Will a patient die in the hospital?) (Johnson et al. 2016; Harutyunyan et al. 2017), and credit risk assessment (Will a borrower fail to pay back?)4. |
| Dataset Splits | No | The paper mentions training on specific numbers of examples (200 and 5000) and refers to general use of 'validation set', but does not provide explicit training/test/validation dataset split percentages, absolute counts for distinct sets, or specific methodology for reproduction beyond training data sizes. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or types of computing instances used for the experiments. |
| Software Dependencies | No | The paper mentions machine learning models (logistic regression, MLP) and loss functions, but does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the implementation or experiments. |
| Experiment Setup | No | The paper describes varying the λc parameter in their reformulated objective, but does not provide specific hyperparameter values such as learning rates, batch sizes, epochs, or optimizer settings used for training the models in their experiments. |