Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes
Authors: Connor Toups, Rishi Bommasani, Kathleen Creel, Sarah Bana, Dan Jurafsky, Percy S. Liang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across three modalities (text, images, speech) and eleven datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. To establish general trends in deployed machine learning, we draw upon a large-scale audit [HAPI; Chen et al., 2022a] that spans three modalities (images, speech, text), three commercial systems per modality, and eleven datasets overall. |
| Researcher Affiliation | Academia | Connor Toups Stanford University Rishi Bommasani Stanford University Kathleen A. Creel Northeastern University Sarah H. Bana Chapman University Dan Jurafsky Stanford University Percy Liang Stanford University |
| Pseudocode | No | The paper describes its analytical framework and findings but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | 1All code is available at https://github.com/rishibommasani/Ecosystem Level Analysis. |
| Open Datasets | Yes | To establish general trends made visible through ecosystem-level analysis, we draw upon a large-scale three-year audit of commercial ML APIs [HAPI; Chen et al., 2022a] to study the behavior of deployed ML systems across three modalities, eleven datasets, and nine commercial systems. We compare outcomes from prominent dermatology models and board-certified dermatologists on the DDI dataset [Daneshjou et al., 2022]: |
| Dataset Splits | No | The paper discusses using 'evaluation datasets' and mentions 'the test set of FER+,' but it does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for reproducing its analysis setup. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct its analysis or computations. |
| Software Dependencies | No | The paper does not list specific versions for any software libraries, frameworks, or programming languages used in their analysis. |
| Experiment Setup | No | The paper describes its analytical setup, such as defining 'potential improvements' and 'improvements' based on model behavior changes. However, it does not specify typical experimental setup details like hyperparameters, optimizers, or training configurations, as it is an analysis paper rather than one proposing a new model that requires training. |