A hierarchical decomposition for explaining ML performance discrepancies

Authors: Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej Gossmann, Jean Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the utility of our framework in real-world examples of prediction models for hospital readmission and insurance coverage. Code for reproducing experiments is available at https://github.com/jjfeng/HDPD.
Researcher Affiliation Collaboration Harvineet Singh1 Fan Xia1 Adarsh Subbaswamy2 Alexej Gossmann2 Jean Feng1 1University of California, San Francisco 2U.S. Food and Drug Administration, Center for Devices and Radiological Health
Pseudocode Yes Algorithm 1 Aggregate decompositions into baseline, conditional covariate, and conditional outcome shifts; Algorithm 2 VALUECONDITIONALOUTCOME(S): Value for s-partial conditional outcome shift for a subset s; Algorithm 3 VALUECONDITIONALCOVARIATE(S): Value for s-partial conditional covariate shift for a subset s; Algorithm 4 Detailed decomposition for conditional outcome and covariate shift
Open Source Code Yes Code for reproducing experiments is available at https://github.com/jjfeng/HDPD.
Open Datasets Yes We analyze a neural network trained to predict whether a person has public health insurance using data from Nebraska in the American Community Survey (source, n = 3000), applied to data from Louisiana (target, n = 6000).
Dataset Splits Yes Let the data be randomly split into training and evaluation partitions. ... We fit all models on 80% of the data points from both source and target datasets which is the Tr partition, and keep the remaining 20% for computing the estimators which is the Ev partition.
Hardware Specification Yes All experiments are run on a 2.60 GHz processor with 8 CPU cores.
Software Dependencies No The paper mentions using 'scikit-learn implementations' but does not specify version numbers for any software dependencies like scikit-learn, Python, or other libraries.
Experiment Setup Yes We use 3-fold cross validation to select models. ... We clip the predicted probabilities from the density model for π at 10 6 to avoid very large density weights. ... Specific hyperparameter ranges for the grid search are provided in the code.