Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
Authors: Mike Wu, Michael Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. |
| Researcher Affiliation | Academia | 1Stanford University, wumike@cs.stanford.edu 2Harvard University SEAS, mike@michaelchughes.com, finale@seas.harvard.edu 3University of Basel, {sonali.parbhoo,volker.roth}@unibas.ch 4University of Siena, maurizio.zazzi@unisi.it |
| Pseudocode | Yes | Algorithm 1 Average-Path-Length Cost Function Require: ˆy( , W) : binary prediction function, with parameters W D = {xn}N n=1 : reference dataset with N examples 1: function Ω(W) 2: tree TRAINTREE({xn, ˆy(xn, W)}) 3: return 1 N n PATHLENGTH(tree, xn) |
| Open Source Code | Yes | We have released an open-source Python toolbox to allow others to experiment with tree regularization 1. 1http://github.com/dtak/tree-regularization-public |
| Open Datasets | Yes | We study time-series data for 11 786 septic ICU patients from the public MIMIC III dataset (Johnson et al. 2016). We use the Eu Resist Integrated Database (Zazzi et al. 2012) for 53 236 patients diagnosed with HIV. We have recordings of 630 speakers... (Garofolo and others 1993). |
| Dataset Splits | Yes | Sepsis Critical Care: ...7 070 patients are used in training, 1 769 for validation, and 294 for test. HIV Therapy Outcome (HIV): ...37 618 patients are used for training; 7 986 for testing, and 7 632 for validation. Phonetic Speech (TIMIT): ...6 303 sequences, split into 3 697 for training, 925 for validation, and 1 681 for testing. |
| Hardware Specification | No | The paper mentions that computations were supported by "the FAS Research Computing Group at Harvard and sci CORE (http://scicore.unibas.ch/) scientific computing core facility at University of Basel" but does not provide specific hardware details such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like "Python s scikit-learn" and "Autograd" but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The objective in equation 1 was optimized via Adam gradient descent (Kingma and Ba 2014) using a batch size of 100 and a learning rate of 1e-3 for 250 epochs, and hyperparameters were set via cross validation using grid search (see supplement for full experimental details). Optimization of our surrogate objective is done via gradient descent. We use Autograd to compute gradients of the loss in Eq. (5) with respect to ξ, then use Adam to compute descent directions with step sizes set to 0.01 for toy datasets and 0.001 for real world datasets. |