Accuracy, Interpretability, and Differential Privacy via Explainable Boosting
Authors: Harsha Nori, Rich Caruana, Zhiqi Bu, Judy Hanwen Shen, Janardhan Kulkarni
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on multiple classification and regression datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees. |
| Researcher Affiliation | Collaboration | 1Microsoft, Redmond, USA. 2University of Pennsylvania, Philadelphia, USA. 3Stanford University, Palo Alto, USA. |
| Pseudocode | Yes | Algorithm 1 Explainable Boosting |
| Open Source Code | Yes | We extend the EBM package to include DP-EBMs1, which makes DP-EBMs as easy to use as regular EBMs or any scikit-learn model. 1https://github.com/interpretml/interpret |
| Open Datasets | Yes | The datasets used in these experiments (with the exception of the healthcare data, which contains real patient data) are publicly available and summarized in Table 1." and "Dua, D. and Graff, C. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml. |
| Dataset Splits | No | To evaluate performance, we generate 25 randomly drawn 80/20 train-test splits and report the average test-set accuracy and standard deviation at varying ε and fixed δ = 10 6. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | We extend the EBM package to include DP-EBMs1, which makes DP-EBMs as easy to use as regular EBMs or any scikit-learn model." and "For both models, we use IBM s differential privacy library (Holohan, 2019)". |
| Experiment Setup | Yes | We use the following (default) parameters for all experiments: max bins = 32, learning rate = 0.01, n epochs = 300, max leaves = 3, with 10% of the total privacy budget allocated to binning and 90% to training. |