reproducibilityindex.ai

A hierarchical decomposition for explaining ML performance discrepancies

Authors: Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej Gossmann, Jean Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of our framework in real-world examples of prediction models for hospital readmission and insurance coverage. Code for reproducing experiments is available at https://github.com/jjfeng/HDPD.
Researcher Affiliation	Collaboration	Harvineet Singh1 Fan Xia1 Adarsh Subbaswamy2 Alexej Gossmann2 Jean Feng1 1University of California, San Francisco 2U.S. Food and Drug Administration, Center for Devices and Radiological Health
Pseudocode	Yes	Algorithm 1 Aggregate decompositions into baseline, conditional covariate, and conditional outcome shifts; Algorithm 2 VALUECONDITIONALOUTCOME(S): Value for s-partial conditional outcome shift for a subset s; Algorithm 3 VALUECONDITIONALCOVARIATE(S): Value for s-partial conditional covariate shift for a subset s; Algorithm 4 Detailed decomposition for conditional outcome and covariate shift
Open Source Code	Yes	Code for reproducing experiments is available at https://github.com/jjfeng/HDPD.
Open Datasets	Yes	We analyze a neural network trained to predict whether a person has public health insurance using data from Nebraska in the American Community Survey (source, n = 3000), applied to data from Louisiana (target, n = 6000).
Dataset Splits	Yes	Let the data be randomly split into training and evaluation partitions. ... We fit all models on 80% of the data points from both source and target datasets which is the Tr partition, and keep the remaining 20% for computing the estimators which is the Ev partition.
Hardware Specification	Yes	All experiments are run on a 2.60 GHz processor with 8 CPU cores.
Software Dependencies	No	The paper mentions using 'scikit-learn implementations' but does not specify version numbers for any software dependencies like scikit-learn, Python, or other libraries.
Experiment Setup	Yes	We use 3-fold cross validation to select models. ... We clip the predicted probabilities from the density model for π at 10 6 to avoid very large density weights. ... Specific hyperparameter ranges for the grid search are provided in the code.