Accuracy, Interpretability, and Differential Privacy via Explainable Boosting

Authors: Harsha Nori, Rich Caruana, Zhiqi Bu, Judy Hanwen Shen, Janardhan Kulkarni

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on multiple classification and regression datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees.
Researcher Affiliation Collaboration 1Microsoft, Redmond, USA. 2University of Pennsylvania, Philadelphia, USA. 3Stanford University, Palo Alto, USA.
Pseudocode Yes Algorithm 1 Explainable Boosting
Open Source Code Yes We extend the EBM package to include DP-EBMs1, which makes DP-EBMs as easy to use as regular EBMs or any scikit-learn model. 1https://github.com/interpretml/interpret
Open Datasets Yes The datasets used in these experiments (with the exception of the healthcare data, which contains real patient data) are publicly available and summarized in Table 1." and "Dua, D. and Graff, C. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
Dataset Splits No To evaluate performance, we generate 25 randomly drawn 80/20 train-test splits and report the average test-set accuracy and standard deviation at varying ε and fixed δ = 10 6.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No We extend the EBM package to include DP-EBMs1, which makes DP-EBMs as easy to use as regular EBMs or any scikit-learn model." and "For both models, we use IBM s differential privacy library (Holohan, 2019)".
Experiment Setup Yes We use the following (default) parameters for all experiments: max bins = 32, learning rate = 0.01, n epochs = 300, max leaves = 3, with 10% of the total privacy budget allocated to binning and 90% to training.